Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prometheus Stale NaN Values are Reported with 0 Value #26716

Closed
michaelli321 opened this issue Sep 15, 2023 · 9 comments
Closed

Prometheus Stale NaN Values are Reported with 0 Value #26716

michaelli321 opened this issue Sep 15, 2023 · 9 comments
Labels
bug Something isn't working receiver/prometheus Prometheus receiver

Comments

@michaelli321
Copy link

Component(s)

receiver/prometheus

What happened?

Description

If a Stale NaN value is returned from the Prometheus job, it is converted to a NumberDataPoint with value 0 in the prometheusreciever

Steps to Reproduce

  1. Create a prometheus server that doesn't always report a certain metric
  2. Have the prometheusreceiver scrape that server
  3. Observe that on scrapes where the metric disappears, a 0 value is exported for that metric

Expected Result

The metric should either not be present or that metric should have a value of NaN

Actual Result

A misleading 0 datapoint is exported for the metric

Collector version

v0.81.0

Environment information

Environment

Compiler(if manually compiled): go 1.20.6

OpenTelemetry Collector configuration

No response

Log output

No response

Additional context

When the prometheusreceiver is converting the prometheus values to OTLP, it never sets a value on the NumberDataPoint (ref).

@michaelli321 michaelli321 added bug Something isn't working needs triage New item requiring triage labels Sep 15, 2023
@github-actions github-actions bot added the receiver/prometheus Prometheus receiver label Sep 15, 2023
@github-actions
Copy link
Contributor

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@dashpole
Copy link
Contributor

The receiver uses the DATA_POINT_FLAGS_NO_RECORDED_VALUE data point flag to indicate that the point has no value. https://github.com/open-telemetry/opentelemetry-proto/blob/ac3242b03157295e4ee9e616af53b81517b06559/opentelemetry/proto/metrics/v1/metrics.proto#L332-L335

It seems like correct behavior to not record a value when setting the NO_RECORDED_VALUE flag, right?

The metric should either not be present or that metric should have a value of NaN

We need the metric to be present to allow prometheus exporters to work correctly. Setting the value to NaN is a possibility, and we used to do that IIRC, but it was much more confusing for exporters that didn't check the data point flag. I remember at least one case where an exporter was converting the value to an Int, and got a large negative number, which messed up graphs and rates.

I'll see if @Aneurysm9 has any thoughts, but I'm leaning towards keeping the current behavior.

@sirianni
Copy link
Contributor

sirianni commented Sep 18, 2023

Thanks @dashpole!

to allow prometheus exporters to work correctly.

Right, this makes sense. I think the issue is that when using the prometheusreceiver with the otlpexporter a datapoint with a value of 0 is emitted. This is causing incorrect behavior in our backend when doing aggregations like average, etc.

@mx-psi
Copy link
Member

mx-psi commented Sep 18, 2023

Thanks @dashpole!

to allow prometheus exporters to work correctly.

Right, this makes sense. I think the issue is that when using the prometheusreceiver with the otlpexporter a datapoint with a value of 0 is emitted. This is causing incorrect behavior in our backend when doing aggregations like average, etc.

The OTLP exporter should also be exporting the NO_RECORDED_VALUE. Maybe your backend does not support NO_RECORDED_VALUE?

@sirianni
Copy link
Contributor

Maybe your backend does not support NO_RECORDED_VALUE?

Ah, that's likely what is happening. Still, would it be better for the OTLP exporter to be emitting NaN instead of 0?

@michaelli321
Copy link
Author

@mx-psi @dashpole Do you have any updates here? It may take sometime for the backends to implement the functionality to support NO_RECORDED_VALUE. Would the best option here be to write a custom processor for these metrics? OTTL doesn't work here since it doesn't support the bitwise & operator

@dashpole
Copy link
Contributor

You could certainly write a custom processor. I don't know if it would be accepted in the contrib repo, but you could build it into your own collector to work around the issue until the backend handles the flag.

@crobert-1
Copy link
Member

It seems like the conclusion of this issue is that everything collector-related is functioning as intended, but some backends do not yet support the NO_RECORDED_VALUE flag. Am I following this properly? If so, I believe we can close this issue.

@crobert-1 crobert-1 removed the needs triage New item requiring triage label Nov 1, 2023
@dashpole
Copy link
Contributor

dashpole commented Nov 1, 2023

Agreed

@dashpole dashpole closed this as completed Nov 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working receiver/prometheus Prometheus receiver
Projects
None yet
Development

No branches or pull requests

5 participants