Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactoring of Scrape process, fixing multiple module issues #1111

Merged
merged 8 commits into from
Feb 19, 2024

Conversation

servak
Copy link
Contributor

@servak servak commented Feb 9, 2024

Since enabling the specification of multiple modules, we have observed an increase in DNS requests.
This issue seems to stem from gosnmp performing name resolution for each module.
We have revised the implementation around SNMP scraping.
By having each worker hold its instance of gosnmp, we can resolve the aforementioned issue.

Additionally, we improved maintainability by utilizing an interface to hold gosnmp instead of doing so directly in ScrapeTarget. We're hoping this change will enable us to implement a scraper that avoids retrieving the same OID multiple times by caching the OIDs once they are fetched.
Initially, we contemplated completely eliminating the dependency on gosnmp, but such a measure would have excessively broadened the scope of implementation, prompting us to forgo this option.

I haven't yet implemented error handling for connection failures, and it's still a work in progress, but I would appreciate a review including feedback on the general direction.

test env

> go run ./ --log.level=debug --config.file ./generator/snmp.yml --web.listen-address :9117
ts=2024-02-09T08:32:07.864Z caller=main.go:194 level=info msg="Starting snmp_exporter" version="(version=, branch=, revision=unknown)" concurrency=1
ts=2024-02-09T08:32:07.865Z caller=main.go:195 level=info build_context="(go=go1.21.6, platform=darwin/arm64, user=, date=, tags=unknown)"
ts=2024-02-09T08:32:07.867Z caller=tls_config.go:313 level=info msg="Listening on" address=[::]:9117
ts=2024-02-09T08:32:07.867Z caller=tls_config.go:316 level=info msg="TLS is disabled." http2=false address=[::]:9117
ts=2024-02-09T08:38:02.606Z caller=collector.go:437 level=debug auth=public_v2 target=192.168.64.3 module=if_mib msg="Starting scrape"
ts=2024-02-09T08:38:02.607Z caller=gosnmp.go:90 level=debug auth=public_v2 target=192.168.64.3 worker=0 msg="Walking subtree" oid=1.3.6.1.2.1.2
ts=2024-02-09T08:38:02.644Z caller=gosnmp.go:106 level=debug auth=public_v2 target=192.168.64.3 worker=0 msg="Walk of subtree completed" oid=1.3.6.1.2.1.2 duration_seconds=37.255167ms
ts=2024-02-09T08:38:02.644Z caller=gosnmp.go:90 level=debug auth=public_v2 target=192.168.64.3 worker=0 msg="Walking subtree" oid=1.3.6.1.2.1.31.1.1
ts=2024-02-09T08:38:02.744Z caller=gosnmp.go:106 level=debug auth=public_v2 target=192.168.64.3 worker=0 msg="Walk of subtree completed" oid=1.3.6.1.2.1.31.1.1 duration_seconds=100.006833ms
ts=2024-02-09T08:38:02.746Z caller=collector.go:441 level=debug auth=public_v2 target=192.168.64.3 module=if_mib msg="Finished scrape" duration_seconds=0.139322458
ts=2024-02-09T08:38:02.746Z caller=collector.go:437 level=debug auth=public_v2 target=192.168.64.3 module=sysDescr msg="Starting scrape"
ts=2024-02-09T08:38:02.746Z caller=gosnmp.go:73 level=debug auth=public_v2 target=192.168.64.3 worker=0 msg="Getting OIDs" oids="unsupported value type"
ts=2024-02-09T08:38:02.960Z caller=gosnmp.go:85 level=debug auth=public_v2 target=192.168.64.3 worker=0 msg="Get of OIDs completed" oids="unsupported value type" duration_seconds=213.846666ms
ts=2024-02-09T08:38:02.960Z caller=collector.go:441 level=debug auth=public_v2 target=192.168.64.3 module=sysDescr msg="Finished scrape" duration_seconds=0.213982416

> ./snmp_exporter_orig --config.file ./generator/snmp.yml
ts=2024-02-09T06:26:36.501Z caller=main.go:194 level=info msg="Starting snmp_exporter" version="(version=, branch=, revision=acb6e787f193d340f2c84e1fa0b4744e7fea7e63-modified)" concurrency=1
ts=2024-02-09T06:26:36.501Z caller=main.go:195 level=info build_context="(go=go1.21.6, platform=darwin/arm64, user=, date=, tags=unknown)"
ts=2024-02-09T06:26:36.504Z caller=tls_config.go:313 level=info msg="Listening on" address=[::]:9116
ts=2024-02-09T06:26:36.504Z caller=tls_config.go:316 level=info msg="TLS is disabled." http2=false address=[::]:9116

test command log

> curl -s 'localhost:9116/snmp?target=192.168.64.3&module=if_mib,sysDescr' | tee orig | tail
# TYPE snmp_scrape_pdus_returned gauge
snmp_scrape_pdus_returned{module="if_mib"} 121
snmp_scrape_pdus_returned{module="sysDescr"} 1
# HELP snmp_scrape_walk_duration_seconds Time SNMP walk/bulkwalk took.
# TYPE snmp_scrape_walk_duration_seconds gauge
snmp_scrape_walk_duration_seconds{module="if_mib"} 0.0686725
snmp_scrape_walk_duration_seconds{module="sysDescr"} 0.001960042
# HELP sysDescr A textual description of the entity - 1.3.6.1.2.1.1.1
# TYPE sysDescr gauge
sysDescr{sysDescr="Cumulus Linux 3.7.16 (Linux Kernel 4.1.33-1+cl3.7.15u1)"} 1

> curl -s 'localhost:9117/snmp?target=192.168.64.3&module=if_mib,sysDescr' | tee new | tail
# TYPE snmp_scrape_pdus_returned gauge
snmp_scrape_pdus_returned{module="if_mib"} 121
snmp_scrape_pdus_returned{module="sysDescr"} 1
# HELP snmp_scrape_walk_duration_seconds Time SNMP walk/bulkwalk took.
# TYPE snmp_scrape_walk_duration_seconds gauge
snmp_scrape_walk_duration_seconds{module="if_mib"} 0.0529665
snmp_scrape_walk_duration_seconds{module="sysDescr"} 0.009939875
# HELP sysDescr A textual description of the entity - 1.3.6.1.2.1.1.1
# TYPE sysDescr gauge
sysDescr{sysDescr="Cumulus Linux 3.7.16 (Linux Kernel 4.1.33-1+cl3.7.15u1)"} 1

> diff orig new
48c48
< ifHCOutOctets{ifAlias="eth0",ifDescr="Intel Corporation 82540EM Gigabit Ethernet Controller",ifIndex="2",ifName="eth0"} 639396
---
> ifHCOutOctets{ifAlias="eth0",ifDescr="Intel Corporation 82540EM Gigabit Ethernet Controller",ifIndex="2",ifName="eth0"} 634472
53c53
< ifHCOutUcastPkts{ifAlias="eth0",ifDescr="Intel Corporation 82540EM Gigabit Ethernet Controller",ifIndex="2",ifName="eth0"} 3238
---
> ifHCOutUcastPkts{ifAlias="eth0",ifDescr="Intel Corporation 82540EM Gigabit Ethernet Controller",ifIndex="2",ifName="eth0"} 3229
191,192c191,192
< snmp_scrape_duration_seconds{module="if_mib"} 0.164523709
< snmp_scrape_duration_seconds{module="sysDescr"} 0.19121475
---
> snmp_scrape_duration_seconds{module="if_mib"} 0.139320042
> snmp_scrape_duration_seconds{module="sysDescr"} 0.213978041
207,208c207,208
< snmp_scrape_walk_duration_seconds{module="if_mib"} 0.16301675
< snmp_scrape_walk_duration_seconds{module="sysDescr"} 0.191177
---
> snmp_scrape_walk_duration_seconds{module="if_mib"} 0.137472542
> snmp_scrape_walk_duration_seconds{module="sysDescr"} 0.213934416

Signed-off-by: Kakuya Ando <fservak@gmail.com>
…ScrapeTarget function

Signed-off-by: Kakuya Ando <fservak@gmail.com>
…nformation

Signed-off-by: Kakuya Ando <fservak@gmail.com>
Signed-off-by: Kakuya Ando <fservak@gmail.com>
Signed-off-by: Kakuya Ando <fservak@gmail.com>
Signed-off-by: Kakuya Ando <fservak@gmail.com>
@servak
Copy link
Contributor Author

servak commented Feb 12, 2024

We have added error handling and testing that was not done. Could you please review it?

@SuperQ
Copy link
Member

SuperQ commented Feb 12, 2024

Interesting, I'm currently on vacation but will try and review this soon.

@SuperQ SuperQ self-requested a review February 12, 2024 14:23
@SuperQ
Copy link
Member

SuperQ commented Feb 12, 2024

We're hoping this change will enable us to implement a scraper that avoids retrieving the same OID multiple times by caching the OIDs once they are fetched.

I've been considering implementing a full walk cache in the exporter, to help with things like "Cache data for 60s, but cache ifDescr for 1 hour."

Initially, we contemplated completely eliminating the dependency on gosnmp, but such a measure would have excessively broadened the scope of implementation, prompting us to forgo this option.

If you have improvements to gosnmp, I would happily review those as well, as I am a maintainer there. I would not want to implement that inside the exporter when gosnmp is a well supported and widely used community implementation.

@servak
Copy link
Contributor Author

servak commented Feb 13, 2024

I've been considering implementing a full walk cache in the exporter, to help with things like "Cache data for 60s, but cache ifDescr for 1 hour."

We assume a temporary cache for de-duplication, as previously pointed out.
#945 (review)

@servak
Copy link
Contributor Author

servak commented Feb 13, 2024

Sorry, I didn't write the intention I wanted to convey correctly: we considered making the return value of the SNMPScraper interface independent of gosnmp, but we used gosnmp as the return value because the current situation is better in terms of error handling.

Copy link
Member

@SuperQ SuperQ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks reasonable, I think the refactoring makes sense.

scraper/gosnmp.go Outdated Show resolved Hide resolved
Signed-off-by: Kakuya Ando <fservak@gmail.com>
@servak
Copy link
Contributor Author

servak commented Feb 15, 2024

Thanks for the review. I fixed it

Signed-off-by: Kakuya Ando <fservak@gmail.com>
}
return
}
level.Debug(g.logger).Log("msg", "Get of OIDs completed", "oids", oids, "duration_seconds", time.Since(st))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the new scraper, the module logging context is lost. I guess if we're heading towards duplication and caching this won't matter. It's just a change I noticed in some local testing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for confirming that. As you say, we will leave it at that.

@SuperQ
Copy link
Member

SuperQ commented Feb 18, 2024

CC @bastischubert You may be interested in this.

@SuperQ
Copy link
Member

SuperQ commented Feb 18, 2024

@servak I wonder if it would be useful to add an OnResolv() function to gosnmp so we can instrument the lookups upstream.

@servak
Copy link
Contributor Author

servak commented Feb 19, 2024

I wonder if it would be useful to add an OnResolv() function to gosnmp so we can instrument the lookups upstream.

I think it's a good idea.

@SuperQ SuperQ merged commit 862b410 into prometheus:main Feb 19, 2024
6 checks passed
@servak servak deleted the add-scraper branch February 20, 2024 00:17
@SuperQ SuperQ mentioned this pull request May 10, 2024
SuperQ pushed a commit that referenced this pull request May 11, 2024
* [CHANGE] Improve generator parse error handling #1167
* [ENHANCEMENT] generator: Add generator HELP override #1106
* [ENHANCEMENT] Refactoring of Scrape process, fixing multiple module issues #1111
* [ENHANCEMENT] Skip using an interactive terminal in "make docker-generate". #1113
* [ENHANCEMENT] Add SNMPInflight metric #1119
* [FEATURE] Support for passing username, password & priv_password as env vars #1074
* [FEATURE] Add GoSNMP logger #1157
* [FEATURE] Add a "snmp_context" parameter to the URL #1163
* [BUGFIX] generator: curl failed #1094
* [BUGFIX] Fix SNMPv3 password configuration #1122
* [BUGFIX] generator: Update generator User-Agent #1133
* [BUGFIX] generator: fix mibs directory specification for parse_errors command #1135
* [BUGFIX] generator: remove extra character from dell iDrac-SMIv1 MIB #1141
* [BUGFIX] Fix do not expand envvars for empty config fields #1148

snmp.yml changes:
* Updated Cisco MIBs #1180
* Updated Cyberpower MIBs #1124
* Updated servertech_sentry3 #1090
* Added support for Dell iDrac  #1125
---------

Signed-off-by: Sebastian Schubert <basti@schubert.digital>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants