-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prometheus Exporter - Gauge counter metrics dropped with error 'failed to translate metric' #6425
Comments
Hi, I am new to the project and I am trying to reproduce this. I have a few questions:
Does this mean setting the collector period to 30secs? Like this:
What do you mean by not have any values recorded every once in a while? I am recording the value for the gauge every 120s, like this:
I am unable to reproduce this issue with this setup though. |
Hey Goutham, thanks for checking. I was able to do identify the root cause of the log error, and it seems this is the expected behavior for the circumstance I described. Answers to your questions -> "Does this mean setting the collector period to 30secs?": yes "What do you mean by not have any values recorded every once in a while?": I should have been clearer in my description about this. Before exporting instruments, you must clear/delete the values recorded for the instrument, so as to ensure that these instruments are still defined but don't have a value. Now if in the next 30 seconds there are no calls to record for an instrument, and the code is trying to export all instruments without any checks - then an attempt is made to export an instrument that does not have any data with it (the datatype ends up getting set to None ("\u0000")). The second answer above is the identified root cause for the log error. By adding a check in the exporter (to make sure export is only called on instruments that have seen a value) solved the problem since it does not attempt to write this instrument to Prometheus. I believe this is how the design is and that this behavior (the log error I described) is expected. Feel free to reopen this issue if you believe it is a bug. |
Which SDK are you using? I cannot see any method to do the same in the Golang SDK. |
I used an adaptation of the Python SDK with an option to clear instruments after exporting to Prometheus. |
This splits the OTLP receiver into its own module. Currently this leaves `scraperhelper` and `scrapererror` inside the main collector module, this is similar to `exporterhelper`. Note that doing this split brought up the interesting issue that the OTLP HTTP exporter depends on the OTLP receiver for some of its tests. I can address this separately from this PR. Fixes open-telemetry/opentelemetry-collector#6190
Describe the bug
I'm using an opentelemetry collector setup to receive metrics from an app. From the opentelemetry collector, these metrics are being exported to Prometheus. The datatype of these metrics is gauge. Looking through the logs, I see a series of errors pertaining to a random subset of the metrics showing up periodically (assumedly during export attempts for these specific metrics). Here is an example of one such error seen under otelcollector logs:
2021-11-23T03:05:25.121Z error prometheusexporter@v0.39.0/accumulator.go:96 failed to translate metric {"kind": "exporter", "name": "prometheus", "data_type": "\u0000", "metric_name": "elastic_successful_bulk_write_count"}
In the above example, 'elastic_successful_bulk_write_count' is actually of datatype gauge. However, it appears as though this error shows up in the log for metrics when they don't have any value recorded.
Looks like the log is produced by the addMetric function over here in accumulator.go
It is hard to say why these logs are showing up for metrics which are of gauge datatype. I have not been able to trace this issue back to the app->otelcollector pipe, however, it seems to be produced during instances where the instruments do not have any values recorded.
Steps to reproduce
Use an application to send opentelemetry gauge type metrics to otelcollector once every 30 seconds. Export these metrics to Prometheus. For an exact replica, some of the instruments must not have any values recorded every once in a while.
Here is an example explaining how and when the error is generated:
Subsequently, in the logs, the error shows up for this metric this time, as below.
2021-11-24T07:16:49.461Z error prometheusexporter@v0.39.0/accumulator.go:96 failed to translate metric {"kind": "exporter", "name": "prometheus", "data_type": "\u0000", "metric_name": "elastic_bulk_write_time"}
What did you expect to see?
The error log should not show up for unrecorded metrics. I am looking to get confirmation that this is not the expected behavior.
What did you see instead?
Log error reported for metrics.
What version did you use?
v0.38.0
The text was updated successfully, but these errors were encountered: