-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RabbitMQ Input Plugin v1.19.0 unmarshal errors #9383
Comments
I can confirm same thing is happening on Centos 7 with telegraf v1.19.0-1 |
@srebhan Tagging you in on this to see if you can help please? |
@aslgithub it seems like the output of RabbitMQ changed. |
@srebhan The output of RabbitMQ is not changing, I'm simply running the two different versions of Telegraf against the same version of RabbitMQ . Taking one error as the example, v1.19.0 is failing to bring back rabbitmq_node And is read from the /api/nodes My raw json output from the api is But Telegraf v1.18.0 was happy with this, but v1.19.0 obviously isn't happy.. The API is defined I've not found anywhere the field types are defined..yet.. |
I give up ! I cannot find the answer, question submitted on the RabbitMQ Githhub discussion |
@akrantz01 and I were just pairing on this we are also able to replicate the problem. If we change the type on the struct of
I found this description here for the type of nodes.io_write_avg_time:
I can confirm we're not seeing this problem on Telegraf v1.18.1. Alex is now looking into what may have changed to cause these output types to change from RabbitMQ. |
@helenosheaa I think the problem is here as the code changed - json.NewDecoder(resp.Body).Decode(target)
-
- return nil
+ return json.NewDecoder(resp.Body).Decode(target) now checking the returned error, while before those errors went unnoticed. Can anyone of you @aslgithub or @helenosheaa provide some sample data from |
@srebhan sure! /api/federation-links This data was included in v.1.18.1 output as rabbitmq_node without apparent errors, but v1.19.0 is giving the unmarshal error. /api/nodes |
I don't know if this helps, as its not the matching example for the above, but this is the output from Telegraf v.1.18.1 for the node line
|
@aslgithub, can you please attach a file with the JSON next time? :-) I will try to fix the problem. The errors were there before, but they were silently dropped rendering all incorrectly typed values zero. 8-(
Well this is actually the reason for getting this error @aslgithub Stay tuned for testing... ;-P |
@srebhan thanks for working on this! Also staying tuned for testing :) I was in the same situation as @aslgithub with no object found for federation-links. |
@helenosheaa did you find a document stating the datatypes of the returned values (e.g. |
@srebhan I can't find useful documentation outside of https://www.rabbitmq.com/monitoring.html#node-metrics which isn't enlightening on int vs float. However if it's json that it's returning and the numeric type is "number", which is both int and float we may have to explicitly set it to float (if it can ever be a float). I was seeing the same issue for |
The API returned fields are listed as tables on this document, but not the field types. I cannot find the field types documented anywhere, so I posted the question on the RabbitMQ GitHub But no response |
@aslgithub can you please attach a file with the result of |
@srebhan Please find attached..
|
@srebhan I think this is a complete set of an example json of every API call from one of my servers rabbitmq_api_overview.txt |
@aslgithub is it ok to add those files to the repository as test-cases? |
@srebhan Let me check.. |
Oh and strangely you have
in memory where the old test-case has only a single value. So is the total memory the sum of the three values in your case? |
@srebhan , I'm not too familiar with this, but it seems that they are not the sum of the values. From https://www.rabbitmq.com/memory-use.html The screenshot in the article that appears to show this is titled But the next screenshot shows the curl example to json with only the single Total field ... Might need someone else here to give a comparison |
Point taken. I'm now cascading and trying the estimations in the following order "rss", "allocated", "erlang". Is it ok to use the data as test-case? If so I'm, ready to commit. :-) |
Sorry, I was trying to see if I could switch the calculation strategy to get a different result.. I've double checked, I think I have suitably redacted; go for it.. |
Once #9443 built you can use the binaries to test. At least it works with the testcase you provided. Regarding the |
Yeah, here you go. :-) Let me know if it fixes the regression. |
@srebhan Looks good, I've set the metric_exclude and #9443 runs without collection errors. The rest of the overview, exchange and queue output is the same. |
Relevant telegraf.conf:
System info:
Windows Server 2019
Telegraf v1.19.0
RabbitMQ 3.8.14
Erlang 23.2.7
Steps to reproduce:
telegraf.exe --config "C:\Program Files\Telegraf\telegraf.conf" --test --input-filter rabbitmq --debug
Expected behavior:
Collection should complete without errors.
Actual behavior:
There are unmarshal errors within the collection
2021-06-16T16:17:01Z E! [inputs.rabbitmq] Error in plugin: json: cannot unmarshal object into Go value of type []rabbitmq.FederationLink
2021-06-16T16:17:01Z E! [inputs.rabbitmq] Error in plugin: json: cannot unmarshal number 0.0 into Go struct field Node.io_read_avg_time of type int64
2021-06-16T16:17:01Z E! [telegraf] Error running agent: input plugins recorded 2 errors
Additional info:
Telegraf v1.18.1 on the same system does not show the unmarshal errors.
The text was updated successfully, but these errors were encountered: