-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bulk walk timeout results in no data #6450
Comments
Thanks for the report. I don't think we should attempt to return the partial data though, it could be missing tags which would create new unwanted series and could prevent metric filtering from matching as expected, potentially skipping processors or routing to the wrong output. |
Hi, I think you could at least return the values that are complete? So in that example it would mean records with index 1 to 58. |
It seems that most of the time the data would be incomplete unless we were very close to finishing the table. Even if we decide that is the behavior we want I'm not sure it will come up enough to be worth it.
I think what we may want to do for this issue is reconsider how the timeouts work in the SNMP plugin (#3823). For example right now we have per request timeouts, but perhaps with a full gather timeout instead the issue would be mitigated? |
In my situation, almost all of them would be complete except for the last 2. So I would like to have something implemented so that you have at least the complete ones instead of now, you have nothing as result. (While most of the data is actually already present, the plugin just discards it.) But indeed, the referenced issue is a much bigger problem that should be fixed first. Wow! |
So in your case, index 59 & 60 never reply no matter the timeout? Is this a bug in the device you are monitoring? |
Yes it seems so, and I was hoping to get the results of the other indexes from Telegraf as they do respond and are complete. |
I believe changing this would significantly complicate the code for the plugin, since we would need to keep track of if we have received all the data so we can emit the results. I'm going to close this issue as something we won't fix, at least for now, for this reason. If we hear more reports of this type of issue we can reconsider. |
I just checked this again, and it even seems that even 59 and 60 do respond but after that we get a timeout. So it seems a bug in the device to not nicely end a walk for this sequence. |
@MyaLongmire why would change the way you translate OID's help with this issue? |
@Hipska This is an interesting issue - what device are you polling here, and are you still able to reproduce this? What is your max_repetitions set to? Are you able to share a capture of the walk where you get the timeout? @MyaLongmire I don't believe #9518 resolved this issue |
I tried to reproduce, but wasn't able to any of the devices (SRX/MX/EX) I have access to, so I won't be able to test any PRs implementing this feature. |
Closing this for now. If someone comes across this issue, please reopen or open a new issue! |
Relevant telegraf.conf:
System info:
Telegraf 1.12.2 (git: HEAD 8b4c9a0)
Expected behavior:
Return the already received data (if any)
Actual behavior:
Additional info:
This is what is returned from
snmpwalk
orsnmpbulkwalk
:Note that there are 60 indexes on this device, so index 59 and 60 are having issues.
The text was updated successfully, but these errors were encountered: