-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor: snmp to use gosmi #9518
Conversation
Looks like new artifacts were built from this PR. Get them here!Artifact URLs |
I am unsure of the progress of this PR - is it only the tests missing? |
@henriknoerr did you also test how long it takes to startup alone? The reason I'm asking is because if you have lots of unresponding snmp devices, it takes a while for telegraf to shut down since it will wait for timeouts and retries on those.. |
I did not know the shutdown process waited for all timeouts - I will look into this. |
390765c
to
c91ce58
Compare
And yes indeed, I was blinded by this snmp MIB PR. Fixing retries and timeout values made a huge difference. |
If a node can't be translated it gives an error. For example |
Correct, but we have the OID, so telegraf can continue and do its work. |
I changed to error to be a warning so telegraf can continue doing its thing. I also added .999 back to the tests. Sorry for the confusion, I was miss understanding the problem :) |
That is super weird. I have no idea what happened. Dave and I will try and fix it. Sorry for the confusion :) |
34d34a7
to
0ea43b3
Compare
🥳 This pull request decreases the Telegraf binary size by -2.47 % for linux amd64 (new size: 127.5 MB, nightly size 130.7 MB) 📦 Looks like new artifacts were built from this PR. Expand this list to get them here! 🐯Artifact URLs |
Why can't it be added without changing functionality? |
…th variable, clean up init, update readme
@Hipska
Our solution was to have one global Hopefully, this solution puts your worries to rest about the global config paths :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rename the filename of internal/snmp/mib.go
into translate.go
or smi.go
which better fits the contained code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perfect, seems ready to me. Don't forget to also revert the other section.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @MyaLongmire, I think LoadMibsFromPath
requires proper locking and protection of gosmi.Init()
to be executed only once.
// must init, append path for each directory, load module for every file | ||
// or gosmi will fail without saying why | ||
func LoadMibsFromPath(paths []string, log telegraf.Logger) error { | ||
gosmi.Init() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAICS, this function is called from each inputs.snmp
plugin. Is this correct? If so, this means that you call gosmi.Init()
multiple time with gosmi
having a global state if my interpretation is correct...
If this is the case, chances are high that you get side-effects between the different plugin-instances potentially ranging from race-conditions when adding the modules to mutual overwriting of the loaded plugins when Init()
is called multiple times... To reduce those effects you should init gosmi
only once (using sync.Once
) and guard this loading function with a lock. I don't know SNMP, but let's hope that there are no collisions possible between multiple (contradicting) mib-definitions...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we do call gosmi.Init
more than once but the library has a built-in check to see if it has already been initialized or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have also added a mutex
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @MyaLongmire, I don't see any check... When calling gosmi.Init()
it calls this function which in turn calls smi.Init()
that can be found here. Already at this point, multiple files are reread etc. Additionally internal.Init()
is called also modifying data. While I agree that currently nothing "critical" is changed there it won't hurt to be on the safe side by
gosmi.Init() | |
once.Do(gosmi.Init) |
and place var once sync.Once
in the package scope, to be sure we call that function only once at least in that package. It would be even better to make sure it's not called elsewhere, but that's probably out of scope for this PR...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is added in pr 10199
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gosmi.Init() always calls smi.Init("gosmi") which always calls internal.Init(handleName) with the same handleName. That function does check whether it already has an internal handle of that name. That's why it looked to me like calling gosmi.Init() multiple times is safe.
It's a good idea to be defensive in our code and add sync.Once() anyway. I'm not sure gosmi intends it to be safe to call gosmi.Init() multiple times. I didn't see a guarantee in the docs, so even if it is now it might not be in a future version of gosmi.
📦 Looks like new artifacts were built from this PR. Expand this list to get them here ! 🐯Artifact URLs |
This reverts commit 7675ce6.
* origin/master: (133 commits) chore: restart service if it is already running and upgraded via RPM (influxdata#9970) feat: update etc/telegraf.conf and etc/telegraf_windows.conf (influxdata#10237) fix: Handle duplicate registration of protocol-buffer files gracefully. (influxdata#10188) fix(http_listener_v2): fix panic on close (influxdata#10132) feat: add Vault input plugin (influxdata#10198) feat: support aws managed service for prometheus (influxdata#10202) fix: Make telegraf compile on Windows with golang 1.16.2 (influxdata#10246) Update changelog feat: Modbus add per-request tags (influxdata#10231) fix: Implement NaN and inf handling for elasticsearch output (influxdata#10196) feat: add nomad input plugin (influxdata#10106) fix: Print loaded plugins and deprecations for once and test (influxdata#10205) fix: eliminate MIB dependency for ifname processor (influxdata#10214) feat: Optimize locking for SNMP MIBs loading. (influxdata#10206) feat: Add SMART plugin concurrency configuration option, nvme-cli v1.14+ support and lint fixes. (influxdata#10150) feat: update configs (influxdata#10236) fix(inputs/kube_inventory): set TLS server name config properly (influxdata#9975) fix: Sudden close of Telegraf caused by OPC UA input plugin (influxdata#10230) fix: bump github.com/eclipse/paho.mqtt.golang from 1.3.0 to 1.3.5 (influxdata#9913) fix: json_v2 parser timestamp setting (influxdata#10221) fix: ensure graylog spec fields not prefixed with '_' (influxdata#10209) docs: remove duplicate links in CONTRIBUTING.md (influxdata#10218) fix: pool detection and metrics gathering for ZFS >= 2.1.x (influxdata#10099) fix: parallelism fix for ifname processor (influxdata#10007) chore: Forbids "log" package only for aggregators, inputs, outputs, parsers and processors (influxdata#10191) docs: address documentation gap when running telegraf in k8s (influxdata#10215) feat: update etc/telegraf.conf and etc/telegraf_windows.conf (influxdata#10211) fix: mqtt topic extracting no longer requires all three fields (influxdata#10208) fix: windows service - graceful shutdown of telegraf (influxdata#9616) feat: update etc/telegraf.conf and etc/telegraf_windows.conf (influxdata#10201) feat: Modbus support multiple slaves (gateway feature) (influxdata#9279) fix: Revert unintented corruption of the Makefile from influxdata#10200. (influxdata#10203) chore: remove triggering update-config bot in CI (influxdata#10195) Update changelog feat: Implement deprecation infrastructure (influxdata#10200) fix: extra lock on init for safety (influxdata#10199) fix: resolve influxdata#10027 (influxdata#10112) fix: register bigquery to output plugins influxdata#10177 (influxdata#10178) fix: sysstat use unique temp file vs hard-coded (influxdata#10165) refactor: snmp to use gosmi (influxdata#9518) ...
Required for all PRs:
resolves #3784
resolves #6450
resolves #5720
No longer calls
snmptranslate
and parses the output. Now it relies ongosmi
library to parse mib files directly to increase performance. Moved all gosmi code into internal/snmp