Description
Relevent telegraf.conf
[[inputs.snmp]]
interval = "60s"
agents = [ "1.2.3.4" ]
version = 2
community = "SECRET"
[[inputs.snmp.table]]
inherit_tags = [ "hostname" ]
oid = "JUNIPER-MIB::jnxOperatingTable"
[[inputs.snmp.table.field]]
oid = "JUNIPER-MIB::jnxOperatingDescr"
is_tag = true
Logs from Telegraf
Note - this is 1.20.4 - I am unable to test to see if this issue exists in 1.21.2 as the SNMP MIB parser appears to still be broken for Juniper MIBs.
These logs illustrate snmptranslate running for each column - but is not very useful logging. There doesn't appear to be detailed logging for the SNMP module.
2022-01-11T09:02:00Z D! [inputs.snmp] executing "snmptranslate" "-Td" "-Ob" "JUNIPER-MIB::jnxOperatingTable"
2022-01-11T09:02:00Z D! [inputs.snmp] executing "snmptranslate" "-Td" "JUNIPER-MIB::jnxOperatingTable.1"
2022-01-11T09:02:00Z D! [inputs.snmp] executing "snmptable" "-Ch" "-Cl" "-c" "public" "127.0.0.1" "JUNIPER-MIB::jnxOperatingTable"
2022-01-11T09:02:00Z D! [inputs.snmp] executing "snmptranslate" "-Td" "-Ob" "JUNIPER-MIB::jnxOperatingDescr"
2022-01-11T09:02:00Z D! [inputs.snmp] executing "snmptranslate" "-Td" "-Ob" "JUNIPER-MIB::jnxOperatingContentsIndex"
2022-01-11T09:02:00Z D! [inputs.snmp] executing "snmptranslate" "-Td" "-Ob" "JUNIPER-MIB::jnxOperatingL1Index"
2022-01-11T09:02:00Z D! [inputs.snmp] executing "snmptranslate" "-Td" "-Ob" "JUNIPER-MIB::jnxOperatingL2Index"
2022-01-11T09:02:00Z D! [inputs.snmp] executing "snmptranslate" "-Td" "-Ob" "JUNIPER-MIB::jnxOperatingL3Index"
2022-01-11T09:02:00Z D! [inputs.snmp] executing "snmptranslate" "-Td" "-Ob" "JUNIPER-MIB::jnxOperatingState"
2022-01-11T09:02:00Z D! [inputs.snmp] executing "snmptranslate" "-Td" "-Ob" "JUNIPER-MIB::jnxOperatingTemp"
2022-01-11T09:02:00Z D! [inputs.snmp] executing "snmptranslate" "-Td" "-Ob" "JUNIPER-MIB::jnxOperatingCPU"
2022-01-11T09:02:00Z D! [inputs.snmp] executing "snmptranslate" "-Td" "-Ob" "JUNIPER-MIB::jnxOperatingISR"
2022-01-11T09:02:00Z D! [inputs.snmp] executing "snmptranslate" "-Td" "-Ob" "JUNIPER-MIB::jnxOperatingDRAMSize"
2022-01-11T09:02:00Z D! [inputs.snmp] executing "snmptranslate" "-Td" "-Ob" "JUNIPER-MIB::jnxOperatingBuffer"
2022-01-11T09:02:00Z D! [inputs.snmp] executing "snmptranslate" "-Td" "-Ob" "JUNIPER-MIB::jnxOperatingHeap"
2022-01-11T09:02:00Z D! [inputs.snmp] executing "snmptranslate" "-Td" "-Ob" "JUNIPER-MIB::jnxOperatingUpTime"
2022-01-11T09:02:00Z D! [inputs.snmp] executing "snmptranslate" "-Td" "-Ob" "JUNIPER-MIB::jnxOperatingLastRestart"
2022-01-11T09:02:00Z D! [inputs.snmp] executing "snmptranslate" "-Td" "-Ob" "JUNIPER-MIB::jnxOperatingMemory"
2022-01-11T09:02:00Z D! [inputs.snmp] executing "snmptranslate" "-Td" "-Ob" "JUNIPER-MIB::jnxOperatingStateOrdered"
2022-01-11T09:02:01Z D! [inputs.snmp] executing "snmptranslate" "-Td" "-Ob" "JUNIPER-MIB::jnxOperatingChassisId"
2022-01-11T09:02:01Z D! [inputs.snmp] executing "snmptranslate" "-Td" "-Ob" "JUNIPER-MIB::jnxOperatingChassisDescr"
2022-01-11T09:02:01Z D! [inputs.snmp] executing "snmptranslate" "-Td" "-Ob" "JUNIPER-MIB::jnxOperatingRestartTime"
2022-01-11T09:02:01Z D! [inputs.snmp] executing "snmptranslate" "-Td" "-Ob" "JUNIPER-MIB::jnxOperating1MinLoadAvg"
2022-01-11T09:02:01Z D! [inputs.snmp] executing "snmptranslate" "-Td" "-Ob" "JUNIPER-MIB::jnxOperating5MinLoadAvg"
2022-01-11T09:02:01Z D! [inputs.snmp] executing "snmptranslate" "-Td" "-Ob" "JUNIPER-MIB::jnxOperating15MinLoadAvg"
2022-01-11T09:02:01Z D! [inputs.snmp] executing "snmptranslate" "-Td" "-Ob" "JUNIPER-MIB::jnxOperating1MinAvgCPU"
2022-01-11T09:02:01Z D! [inputs.snmp] executing "snmptranslate" "-Td" "-Ob" "JUNIPER-MIB::jnxOperating5MinAvgCPU"
2022-01-11T09:02:01Z D! [inputs.snmp] executing "snmptranslate" "-Td" "-Ob" "JUNIPER-MIB::jnxOperating15MinAvgCPU"
2022-01-11T09:02:01Z D! [inputs.snmp] executing "snmptranslate" "-Td" "-Ob" "JUNIPER-MIB::jnxOperatingFRUPower"
2022-01-11T09:02:01Z D! [inputs.snmp] executing "snmptranslate" "-Td" "-Ob" "JUNIPER-MIB::jnxOperatingBufferCP"
2022-01-11T09:02:01Z D! [inputs.snmp] executing "snmptranslate" "-Td" "-Ob" "JUNIPER-MIB::jnxOperatingMemoryCP"
System info
Telegraf 1.20.4
Docker
No response
Steps to reproduce
- Configure telegraf to poll an SNMP table
- Look at tcpdump, and note that when walking a table the same OIDs are in more than one response
Expected behavior
When walking a table, an SNMP manager (i.e. client) should call GetNext or GetBulk on the last OID in the previous response.
Actual behavior
Telegraf does not correctly implement walking whole SNMP tables - instead it treats each column as an SNMP table and walks each column independently, which has a high performance cost on the monitored devices.
If GetBulk returns values in the next column in the table those values are ignored, and the next column is fetched fresh.
In a table with few rows, the same columns may be returned many times - which may cause very high load on the SNMP agent (i.e. device).
For example - Juniper SRX branch devices, not operating in a cluster, have a JUNIPER-SRX5000-SPU-MONITORING-MIB::jnxJsSPUMonitoringObjectsTable
table with one entry (row), and 14 columns. This should be fetched with 2 GetBulk requests with the default max_repetitions of 10 (one getting the first 10, the other the final 4 and then the next 6 entries in the SNMP tree). However, telegraf instead sends 14 GetBulk requests - one for each column, and each response contains the next 10 entries which are then requested again in the next GetBulk request.
When using max_repetitions set higher, this effect gets significantly worse.
Note that SNMP agents will correctly return values outside the table if the end of the table is reached - this should only ever happen once per walk of a table, as the walk detects that the OIDs in the response are outside the requested table. This is OK, even if requesting these values is expensive, as it only happens once. In telegraf's SNMP implementation, this is not the case - it requests data past the end of the table over and over again.
Additional info
No response