Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SNMP input: error in unmarshalResponse #9917

Open
ADovgalyuk opened this issue Oct 13, 2021 · 10 comments · Fixed by #10076
Open

SNMP input: error in unmarshalResponse #9917

ADovgalyuk opened this issue Oct 13, 2021 · 10 comments · Fixed by #10076
Labels
area/snmp bug unexpected problem or unintended behavior

Comments

@ADovgalyuk
Copy link

On the server I have application with snmp interface support. Snmpwalk example:

[root@SOR-01 telegraf.d]# sudo -u telegraf snmpwalk -mALL -v 2c -c public 127.0.0.1:4162 1.3.6.1.4.1.20873
PROTEI-SG-DIAM-MIB::pcsmIndex.1 = INTEGER: 1
PROTEI-SG-DIAM-MIB::pcsmIndex.2 = INTEGER: 2
PROTEI-SG-DIAM-MIB::pcsmIndex.3 = INTEGER: 3
PROTEI-SG-DIAM-MIB::pcsmIndex.4 = INTEGER: 4
PROTEI-SG-DIAM-MIB::pcsmCA.1 = STRING: "Sg.DIAM.PCSM.3"
PROTEI-SG-DIAM-MIB::pcsmCA.2 = STRING: "Sg.DIAM.PCSM.2"
PROTEI-SG-DIAM-MIB::pcsmCA.3 = STRING: "Sg.DIAM.PCSM.1"
PROTEI-SG-DIAM-MIB::pcsmCA.4 = STRING: "Sg.DIAM.PCSM.0"
PROTEI-SG-DIAM-MIB::pcsmOSTATE.1 = STRING: "ACTIVATE"
PROTEI-SG-DIAM-MIB::pcsmOSTATE.2 = STRING: "ACTIVATE"
PROTEI-SG-DIAM-MIB::pcsmOSTATE.3 = STRING: "ACTIVATE"
PROTEI-SG-DIAM-MIB::pcsmOSTATE.4 = STRING: "ACTIVATE"
PROTEI-AD-OMI-MIB::ad-omi-acclIndex.1 = INTEGER: 1
PROTEI-AD-OMI-MIB::ad-omi-acclIndex.2 = INTEGER: 2
PROTEI-AD-OMI-MIB::ad-omi-acclIndex.3 = INTEGER: 3
PROTEI-AD-OMI-MIB::ad-omi-acclCA.1 = STRING: "Ad.OMI.ACCL.2"
PROTEI-AD-OMI-MIB::ad-omi-acclCA.2 = STRING: "Ad.OMI.ACCL.3"
PROTEI-AD-OMI-MIB::ad-omi-acclCA.3 = STRING: "Ad.OMI.ACCL.1"
PROTEI-AD-OMI-MIB::ad-omi-acclOSTATE.1 = STRING: "ACTIVATE"
PROTEI-AD-OMI-MIB::ad-omi-acclOSTATE.2 = STRING: "ACTIVATE"
PROTEI-AD-OMI-MIB::ad-omi-acclOSTATE.3 = STRING: "ACTIVATE"

And the following conf file:

[[inputs.snmp]]
  agents = [ "127.0.0.1:4162" ]
  version = 2
  community = "public"
  tagexclude = ["agent_host"]

[inputs.snmp.tags]
     app="Protei_RG_LTE"
     Metrics_type="app"

[[inputs.snmp.table]]
    name = "OMI_ACCL"
    inherit_tags = [ "app","Metrics_type" ]

        [[inputs.snmp.table.field]]
        name = "CA"
        is_tag = true
        oid = "PROTEI-AD-OMI-MIB::ad-omi-acclCA"

        [[inputs.snmp.table.field]]
        name = "state"
        oid = "PROTEI-AD-OMI-MIB::ad-omi-acclOSTATE"

[[inputs.snmp.table]]
    name = "OMI_ASCL"
    inherit_tags = [ "app","Metrics_type" ]

        [[inputs.snmp.table.field]]
        name = "CA"
        is_tag = true
        oid = "PROTEI-AD-OMI-MIB::ad-omi-asclCA"

        [[inputs.snmp.table.field]]
        name = "state"
        oid = "PROTEI-AD-OMI-MIB::ad-omi-asclOSTATE"

[[processors.enum]]
  [[processors.enum.mapping]]
    field = "state"
#    default = 0

    ## Table of mappings
    [processors.enum.mapping.value_mappings]
      ACTIVATE = 1
      FAIL = 0


At least on version 1.14.5 - everything works as expected:

[root@SOR-01 telegraf.d]# sudo -u telegraf telegraf --test --config Test.conf --debug
2021-10-13T10:12:12Z I! Starting Telegraf 1.14.5
2021-10-13T10:12:12Z D! [agent] Initializing plugins
2021-10-13T10:12:12Z D! [inputs.snmp] executing "snmptranslate" "-Td" "-Ob" "PROTEI-AD-OMI-MIB::ad-omi-acclCA"
2021-10-13T10:12:12Z D! [inputs.snmp] executing "snmptranslate" "-Td" "-Ob" "PROTEI-AD-OMI-MIB::ad-omi-acclOSTATE"
2021-10-13T10:12:12Z D! [inputs.snmp] executing "snmptranslate" "-Td" "-Ob" "PROTEI-AD-OMI-MIB::ad-omi-asclCA"
2021-10-13T10:12:12Z D! [inputs.snmp] executing "snmptranslate" "-Td" "-Ob" "PROTEI-AD-OMI-MIB::ad-omi-asclOSTATE"
> OMI_ACCL,CA=Ad.OMI.ACCL.3,Metrics_type=app,app=Protei_RG_LTE,host=SOR-01 state="ACTIVATE" 1634119933000000000
> OMI_ACCL,CA=Ad.OMI.ACCL.1,Metrics_type=app,app=Protei_RG_LTE,host=SOR-01 state="ACTIVATE" 1634119933000000000
> OMI_ACCL,CA=Ad.OMI.ACCL.2,Metrics_type=app,app=Protei_RG_LTE,host=SOR-01 state="ACTIVATE" 1634119933000000000

But on version 1.20 and the same config we've faced an error:

[root@SOR-02 telegraf.d]# sudo -u telegraf telegraf --test --config RG_LTE_omi.conf --debug
ERRO[0000]log.go:120 gosnowflake.(*defaultLogger).Errorf failed to create cache directory. /etc/telegraf/.cache/snowflake, err: mkdir /etc/telegraf/.cache: permission denied. ignored 
ERRO[0000]log.go:120 gosnowflake.(*defaultLogger).Errorf failed to open. Ignored. open /etc/telegraf/.cache/snowflake/ocsp_response_cache.json: no such file or directory 
2021-10-13T10:23:48Z I! Starting Telegraf 1.20.0
2021-10-13T10:23:48Z D! [agent] Initializing plugins
2021-10-13T10:23:48Z D! [agent] Starting service inputs
2021-10-13T10:23:48Z D! [inputs.snmp] executing "snmptranslate" "-Td" "-Ob" "PROTEI-AD-OMI-MIB::ad-omi-acclCA"
2021-10-13T10:23:48Z D! [inputs.snmp] executing "snmptranslate" "-Td" "-Ob" "PROTEI-AD-OMI-MIB::ad-omi-acclOSTATE"
2021-10-13T10:23:48Z D! [inputs.snmp] executing "snmptranslate" "-Td" "-Ob" "PROTEI-AD-OMI-MIB::ad-omi-asclCA"
2021-10-13T10:23:49Z D! [inputs.snmp] executing "snmptranslate" "-Td" "-Ob" "PROTEI-AD-OMI-MIB::ad-omi-asclOSTATE"
2021-10-13T10:23:49Z E! [inputs.snmp] Error in plugin: agent 127.0.0.1:4162: gathering table OMI_ACCL: performing bulk walk for field CA: error in unmarshalResponse: error parsing OID Value: invalid OID length
2021-10-13T10:23:49Z E! [inputs.snmp] Error in plugin: agent 127.0.0.1:4162: gathering table OMI_ASCL: performing bulk walk for field CA: error in unmarshalResponse: error parsing OID Value: invalid OID length
2021-10-13T10:23:49Z D! [agent] Stopping service inputs
2021-10-13T10:23:49Z D! [agent] Input channel closed
2021-10-13T10:23:49Z D! [agent] Processor channel closed
2021-10-13T10:23:49Z D! [agent] Stopped Successfully
2021-10-13T10:23:49Z E! [telegraf] Error running agent: input plugins recorded 2 errors

I've checked on 1.19.3 and it works there as well. So I can only take a guess that it has something to do with new GO version implemented in 1.20

@ADovgalyuk ADovgalyuk added the bug unexpected problem or unintended behavior label Oct 13, 2021
@reimda
Copy link
Contributor

reimda commented Oct 14, 2021

We don't have a way to reproduce this error because we don't have your snmp device, but I can share a few ideas about what might be related.

It's possible that building with a new go version could cause this change but I think it's very unlikely. I would expect a compiler problem to affect more than just snmp walk.

In the error log line, everything after performing bulk walk for field CA: comes from the gosnmp module That leads me to think that this could be the result of a change in the gosnmp module.

I understand from the issue description that Telegraf 1.14.5 and 1.19.3 both work and 1.20.0 doesn't work. Telegraf uses the same version of the gosnmp library in 1.19.3 and 1.20.0, so we can rule out a change in the gosnmp module:

$ git show v1.20.0:go.mod |grep snmp
	github.com/gosnmp/gosnmp v1.32.0
$ git show v1.19.3:go.mod |grep snmp
	github.com/gosnmp/gosnmp v1.32.0

I noticed that in your tests you used two different telegraf.conf files. The first is telegraf 1.14.5 with Test.conf and the second is 1.20.0 with RG_LTE_omi.conf. I wonder if the difference in behavior is related to a difference in configuration. It's hard to learn much from a test when two variables are changing.

Could you repeat your test with versions 1.20.0 and 1.19.3 but using the same config file. Please also post the config you use so we know for sure what telegraf is being asked to do.

@ADovgalyuk
Copy link
Author

Config was the same, I was trying to remove some company specific info for more simple example, but missed second case, sorry.
Good news, I was able to solve the problem on my proprietary application by tweaking option related to CyclicTreeWalk.

But if we still want to investigate a problem. Seems I was wrong about 19.3 being ok with that, maybe a miss during multiple versions changing. I've return all params as it was before and checked all major versions starting from 1.14.5 to 1.20, this case was broken somewhere between 1.18 and 1.19. Nothing was done between checks beside installing new version.

[root@SOR-01 telegraf.d]# sudo -u telegraf telegraf --test --config RG_LTE_omi.conf --debug
2021-10-15T09:52:34Z I! Starting Telegraf 1.18.0
2021-10-15T09:52:34Z D! [agent] Initializing plugins
2021-10-15T09:52:34Z D! [agent] Starting service inputs
2021-10-15T09:52:34Z D! [inputs.snmp] executing "snmptranslate" "-Td" "-Ob" "PROTEI-AD-OMI-MIB::ad-omi-acclCA"
2021-10-15T09:52:34Z D! [inputs.snmp] executing "snmptranslate" "-Td" "-Ob" "PROTEI-AD-OMI-MIB::ad-omi-acclOSTATE"
2021-10-15T09:52:34Z D! [inputs.snmp] executing "snmptranslate" "-Td" "-Ob" "PROTEI-AD-OMI-MIB::ad-omi-asclCA"
2021-10-15T09:52:34Z D! [inputs.snmp] executing "snmptranslate" "-Td" "-Ob" "PROTEI-AD-OMI-MIB::ad-omi-asclOSTATE"
> OMI_ACCL,CA=Ad.OMI.ACCL.3,Metrics_type=app,app=Protei_RG_LTE,host=SOR-01 state=1i 1634291555000000000
> OMI_ACCL,CA=Ad.OMI.ACCL.2,Metrics_type=app,app=Protei_RG_LTE,host=SOR-01 state=1i 1634291555000000000
> OMI_ACCL,CA=Ad.OMI.ACCL.1,Metrics_type=app,app=Protei_RG_LTE,host=SOR-01 state=1i 1634291555000000000
2021-10-15T09:52:34Z D! [agent] Stopping service inputs
2021-10-15T09:52:34Z D! [agent] Input channel closed
2021-10-15T09:52:34Z D! [agent] Processor channel closed
2021-10-15T09:52:34Z D! [agent] Stopped Successfully

And 1.19

[root@SOR-01 telegraf.d]# sudo -u telegraf telegraf --test --config RG_LTE_omi.conf --debug
ERRO[0000]log.go:120 gosnowflake.(*defaultLogger).Errorf failed to create cache directory. /etc/telegraf/.cache/snowflake, err: mkdir /etc/telegraf/.cache: permission denied. ignored 
ERRO[0000]log.go:120 gosnowflake.(*defaultLogger).Errorf failed to open. Ignored. open /etc/telegraf/.cache/snowflake/ocsp_response_cache.json: no such file or directory 
2021-10-15T09:52:52Z I! Starting Telegraf 1.19.3
2021-10-15T09:52:52Z D! [agent] Initializing plugins
2021-10-15T09:52:52Z D! [agent] Starting service inputs
2021-10-15T09:52:52Z D! [inputs.snmp] executing "snmptranslate" "-Td" "-Ob" "PROTEI-AD-OMI-MIB::ad-omi-acclCA"
2021-10-15T09:52:52Z D! [inputs.snmp] executing "snmptranslate" "-Td" "-Ob" "PROTEI-AD-OMI-MIB::ad-omi-acclOSTATE"
2021-10-15T09:52:52Z D! [inputs.snmp] executing "snmptranslate" "-Td" "-Ob" "PROTEI-AD-OMI-MIB::ad-omi-asclCA"
2021-10-15T09:52:52Z D! [inputs.snmp] executing "snmptranslate" "-Td" "-Ob" "PROTEI-AD-OMI-MIB::ad-omi-asclOSTATE"
2021-10-15T09:52:52Z E! [inputs.snmp] Error in plugin: agent 172.26.17.12:4162: gathering table OMI_ACCL: performing bulk walk for field CA: error in unmarshalResponse: error parsing OID Value: invalid OID length
2021-10-15T09:52:53Z E! [inputs.snmp] Error in plugin: agent 172.26.17.12:4162: gathering table OMI_ASCL: performing bulk walk for field CA: error in unmarshalResponse: error parsing OID Value: invalid OID length
2021-10-15T09:52:53Z D! [agent] Stopping service inputs
2021-10-15T09:52:53Z D! [agent] Input channel closed
2021-10-15T09:52:53Z D! [agent] Processor channel closed
2021-10-15T09:52:53Z D! [agent] Stopped Successfully
2021-10-15T09:52:53Z E! [telegraf] Error running agent: input plugins recorded 2 errors

Is there any additional info I can provide?

@Hipska
Copy link
Contributor

Hipska commented Oct 22, 2021

@reimda gosnmp v1.33.0 has some fixed regarding parsing lengths..

@reimda
Copy link
Contributor

reimda commented Nov 8, 2021

I updated gosnmp to 1.33.0 in #10076. When the CI build completes for that PR, could you test that change and see whether the error that started happening in telegraf 1.20 still happens? Thanks!

@reimda
Copy link
Contributor

reimda commented Nov 8, 2021

The build is here: #10076 (comment). Click the dropdown triangle and choose your platform.

@Hipska Hipska linked a pull request Dec 8, 2021 that will close this issue
@Hipska
Copy link
Contributor

Hipska commented Dec 8, 2021

Resolved by #10076 and included in telegraf 1.20, pleas reopen/comment if this did not solve the current issue.

@Hipska Hipska closed this as completed Dec 8, 2021
@ADovgalyuk
Copy link
Author

ADovgalyuk commented Mar 18, 2022

Hello @Hipska, @reimda
Sorry for no testing previously, Is it too late to reopen this issue? It's still represented in latest version. Was mentioned again in this issue, but no answer.
snmpwalk:

[root@XXX1 telegraf.d]# snmpwalk -v2c -Cc -cpublic 127.0.0.1:3161 .
DIAM-MIB::pcsmIndex.1 = INTEGER: 1
DIAM-MIB::pcsmCA.1 = STRING: "Sg.DIAM.PCSM.2"
DIAM-MIB::pcsmOSTATE.1 = STRING: "ACTIVATE"
End of MIB
[root@XXX1 telegraf.d]# snmpwalk -m ./ -v2c -Cc -cpublic 127.0.0.1:3161 .
Failed to parse MIB file ./
iso.3.6.1.4.1.20873.2.30.1.1.1.1 = INTEGER: 1
iso.3.6.1.4.1.20873.2.30.1.1.100.1 = STRING: "Sg.DIAM.PCSM.2"
iso.3.6.1.4.1.20873.2.30.1.1.4096.1 = STRING: "ACTIVATE"

Config:

[[inputs.snmp]]
    agents = [ "127.0.0.1:3161" ]
    version = 2
    community = "public"
    tagexclude = ["agent_host"]

   [[inputs.snmp.table]]
        name = "Diameter"
        inherit_tags = [ "app","Metrics_type" ]

        [[inputs.snmp.table.field]]
        name = "CA"
        is_tag = true
        oid = "DIAM-MIB::pcsmCA"

        [[inputs.snmp.table.field]]
        name = "State"
        oid = "DIAM-MIB::pcsmOSTATE"

Testing on 1.18:

[root@MME1 telegraf.d]# telegraf --test --config diam.conf --debug
2022-03-18T08:31:57Z I! Starting Telegraf 1.18.1
2022-03-18T08:31:57Z D! [agent] Initializing plugins
2022-03-18T08:31:57Z D! [agent] Starting service inputs
2022-03-18T08:31:57Z D! [inputs.snmp] executing "snmptranslate" "-Td" "-Ob" "DIAM-MIB::pcsmCA"
2022-03-18T08:31:57Z D! [inputs.snmp] executing "snmptranslate" "-Td" "-Ob" "DIAM-MIB::pcsmOSTATE"
2022-03-18T08:31:57Z D! [agent] Stopping service inputs
2022-03-18T08:31:57Z D! [agent] Input channel closed
2022-03-18T08:31:57Z D! [agent] Processor channel closed
2022-03-18T08:31:57Z D! [agent] Stopped Successfully
> Diameter,CA=Sg.DIAM.PCSM.2,host=MME1 State=1i 1647592317000000000

Testing on 21.4:

[root@MME1 telegraf.d]# telegraf --test --config MME_diam.conf 
2022-03-18T08:35:24Z I! Starting Telegraf 1.21.4
2022-03-18T08:35:24Z I! Loaded inputs: snmp
2022-03-18T08:35:24Z I! Loaded aggregators: 
2022-03-18T08:35:24Z I! Loaded processors: 
2022-03-18T08:35:24Z W! Outputs are not used in testing mode!
2022-03-18T08:35:24Z I! Tags enabled: host=MME1
2022-03-18T08:35:24Z E! [inputs.snmp] Error in plugin: agent 127.0.0.1:3161: gathering table Diameter: performing bulk walk for field CA: error in unmarshalResponse: error parsing OID Value: invalid OID length
2022-03-18T08:35:24Z E! [telegraf] Error running agent: input plugins recorded 1 errors

@Hipska
Copy link
Contributor

Hipska commented Mar 18, 2022

Okay, will reopen, please report this back faster next time.

The issue you linked is this exact same issue. So did you try to link another issue? Please let us know which one so we can have a look as well..

@Hipska Hipska reopened this Mar 18, 2022
@ADovgalyuk
Copy link
Author

Thanks a lot. It was mentioned here:
#10654

@ADovgalyuk
Copy link
Author

Hi Team,
Any news or can't be reproduced and no ideas then? Still relevant on latest telegraf versions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/snmp bug unexpected problem or unintended behavior
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants