Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix snmp tools output parsing when they contain Windows eols #3396

Merged
merged 1 commit into from
Nov 21, 2017

Conversation

danielnelson
Copy link
Contributor

closes #3263

Required for all PRs:

  • Signed CLA.
  • Associated README.md updated.
  • Has appropriate unit tests.

@danielnelson danielnelson added area/snmp fix pr to fix corresponding bug labels Oct 26, 2017
@danielnelson danielnelson added this to the 1.5.0 milestone Oct 26, 2017
@pstatho
Copy link

pstatho commented Oct 27, 2017

I'm getting a timeout, with the following in the log file:
2017-10-27T01:41:00Z E! Error in plugin [inputs.snmp]: took longer to collect than collection interval (10s)
I'm just using the example for the documentation and it continue works with regular SNMP fields

@danielnelson
Copy link
Contributor Author

The timeout is potentially normal, depends on how fast the remote agent responds. Does it ever finish a collection or is it completely stuck?

I'm just using the example for the documentation and it continue works with regular SNMP fields

I don't quite understand what you meant here.

@pstatho
Copy link

pstatho commented Oct 27, 2017

My telegraf.conf has the following (as in the example documentation):

[[inputs.snmp]]
  agents = [ "192.168.16.252:161" ]
  version = 1
  community = "public"
  name = "snmp.switch"
  [[inputs.snmp.field]]
    name = "hostname"
    oid = "RFC1213-MIB::sysName.0"
    is_tag = true
  [[inputs.snmp.field]]
    name = "uptime"
    oid = "DISMAN-EXPRESSION-MIB::sysUpTimeInstance"
  [[inputs.snmp.table]]
    name = "interface"
    inherit_tags = [ "hostname" ]
    oid = "IF-MIB::ifTable"
    [[inputs.snmp.table.field]]
      name = "ifDescr"
      oid = "IF-MIB::ifDescr"
      is_tag = true

So regular fields like hostname and uptime return, but anything table related does not return any data.
When I run

>telegraf.exe --config telegraf.conf --test|find "snmp.switch"

The output is:

> snmp.switch,host=MY-HOST-01,hostname=MY-SWITCH-01,agent_host=192.168.16.252 uptime=1618717056i 1509069658000000000

and then it hangs/timesout for 10secs with no further output.

When I run it via the Windows service the log file has the error message I mentioned earlier.

@RubenN
Copy link

RubenN commented Oct 30, 2017

I can confirm that it's working fine for me!

@danielnelson
Copy link
Contributor Author

@pstatho Can you try this build with --debug enabled, I added some temporary logging that hopefully will help us figure out the issue.

https://8695-33258973-gh.circle-artifacts.com/0/tmp/circle-artifacts.j68kVkX/build/windows/amd64/telegraf-1.5.0%257E1238cf3_windows_amd64.zip

@pstatho
Copy link

pstatho commented Oct 30, 2017

Here is the log output:
https://gist.github.com/pstatho/5f5cdb2661b60d5fddd75bc25581f0bd

The command line output is still the same, only the hostname of the switch and uptime are returned, after about 30 seconds.

@danielnelson
Copy link
Contributor Author

@pstatho From the output it looks like the parsing is working, maybe this timeout is unrelated. Any chance your hardware allows you to use version = 2, this will enable bulk walks.

Also can you try running this and record the execution time:

snmpwalk -v1 -c public 127.0.0.1:161 IF-MIB::ifTable

@pstatho
Copy link

pstatho commented Oct 31, 2017

Running against 127.0.0.1 returns:
Timeout: No Response from 127.0.0.1:161

Running the command against my 26-port switch returns in 12 seconds when I pipe the output to a file to remove and rendering/scrolling delays on the console. When I run it against my 50-port switch it takes 22 seconds (almost double the time). I've tried both v1 and v2c and the time to execute the snmpwalk is the same.

@danielnelson
Copy link
Contributor Author

That's pretty slow but I suspect we won't be able to improve upon the time it takes snmpwalk to do the same operation. I think you will have to increase the interval to approx double the collection time.

[[inputs.snmp]]
  interval = "1m"

@phemmer I don't have much experience with snmp on real hardware, is this typical performance?

@phemmer
Copy link
Contributor

phemmer commented Oct 31, 2017

Not for me no. I can run the given config against one of our 48 port switches and it completes in about 5s with version 2c, and about 24s on version 1.

Note that doing snmpwalk -v2c ... is expected to give the same timing as snmpwalk -v1 .... To take advantage of bulk walk using the net-snmp CLI utils, you need to use snmpbulkwalk -v2c ...

@pstatho
Copy link

pstatho commented Oct 31, 2017

@phemmer ah yes you are correct. When I use snmpbulkwalk -v2c it takes about 2 seconds. So does the snmp plugin for telegraf use the bulk command when the version is 2 or higher?

Regardless, when I change my telegraf config to use version 2, I still don't get table output when I test it. I tried with the first patch version and teh second debug version as well.

@danielnelson
Copy link
Contributor Author

danielnelson commented Oct 31, 2017

Yes it does switch to bulkwalk with version 2 and higher, of course we are using the gosnmp library for this functionality, not the snmpbulkwalk command.

I added printing of gosnmp log messages to this branch. It is very verbose, so you must set debug = true on the plugin in addition to running with the --debug flag. Could you try this out and report back the output?

Windows Build

@pstatho
Copy link

pstatho commented Nov 4, 2017

Finally had some time to try out the latest build. It seems to run in a continuous loop when I execute it from the command line. There was nothing outputted in the console.
I believe I captured a complete cycle in this gist.

@danielnelson
Copy link
Contributor Author

Are you running it with --test?

@pstatho
Copy link

pstatho commented Nov 20, 2017

@danielnelson I tried it again and it is outputting to the console. I must have forgotten the --test as you correctly pointed out.
Also to confirm using SNMP version 2 in the config, it returns within a few 1-2 seconds as expected. Looking forward to the official release. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/snmp fix pr to fix corresponding bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

SNMP Table input fails on Windows
4 participants