-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OpenTelemetry stats reports histograms incorrectly #31016
Comments
This is the output from otel collector which includes the metric name.
|
@dashpole Thanks for raising this issue.
Count should be no of observed events and sum should be sum of all observed values correct? In case of envoy output
I see it is giving correct output Is exporter expecting count as no of buckets? Isn't that incorrect? Did i miss anything? |
Hi @sundarms, OpenTelemetry (opentelemetry.io) differs from OpenMetrics (openmetrics.io) in this regard. OpenMetrics buckets are cumulative (count includes all observations below the threshold), as you point out, but OpenTelemetry buckets are not: https://opentelemetry.io/docs/specs/otel/metrics/data-model/#histogram The correct output should have been: Timestamp: 2023-11-22 00:54:12.184643877 +0000 UTC
Count: 1
Sum: 375.000000
ExplicitBounds #0: 0.500000
ExplicitBounds #1: 1.000000
ExplicitBounds #2: 5.000000
ExplicitBounds #3: 10.000000
ExplicitBounds #4: 25.000000
ExplicitBounds #5: 50.000000
ExplicitBounds #6: 100.000000
ExplicitBounds #7: 250.000000
ExplicitBounds #8: 500.000000
ExplicitBounds #9: 1000.000000
ExplicitBounds #10: 2500.000000
ExplicitBounds #11: 5000.000000
ExplicitBounds #12: 10000.000000
ExplicitBounds #13: 30000.000000
ExplicitBounds #14: 60000.000000
ExplicitBounds #15: 300000.000000
ExplicitBounds #16: 600000.000000
ExplicitBounds #17: 1800000.000000
ExplicitBounds #18: 3600000.000000
Buckets #0, Count: 0
Buckets #1, Count: 0
Buckets #2, Count: 0
Buckets #3, Count: 0
Buckets #4, Count: 0
Buckets #5, Count: 0
Buckets #6, Count: 0
Buckets #7, Count: 0
Buckets #8, Count: 1
- Buckets #9, Count: 1
+ Buckets #9, Count: 0
- Buckets #10, Count: 1
+ Buckets #10, Count: 0
- Buckets #11, Count: 1
+ Buckets #11, Count: 0
- Buckets #12, Count: 1
+ Buckets #12, Count: 0
- Buckets #13, Count: 1
+ Buckets #13, Count: 0
- Buckets #14, Count: 1
+ Buckets #14, Count: 0
- Buckets #15, Count: 1
+ Buckets #15, Count: 0
- Buckets #16, Count: 1
+ Buckets #16, Count: 0
- Buckets #17, Count: 1
+ Buckets #17, Count: 0
- Buckets #18, Count: 1
+ Buckets #18, Count: 0 I.e. a single See, for example, the conversion of prometheus histogram bucket counts to OpenTelemetry histogram buckets here. It subtracts the previous bucket's count of a prometheus histogram from the current bucket's count to get the OpenTelemetry histogram bucket's count. |
@dashpole thanks for the explanation. |
Hi @ohadvano , could I ask what's the estimated timeline for this issue be resolved? |
Hi, I'll try to get to this soon. I am now noticing two problems actually. First is as described in this issue, the second is that the Fixing the first is trivial, but I'm trying to see how to fix the second problem. I'm not sure there's an API to get the count of elements out of the explicit bounds, so might need to get this in another way |
Thanks @ohadvano!
Yes, thats correct. |
@ohadvano I'm not sure what level of API you are looking at, but you can get all of the data out of the histogram structures. E.g. see the implementation of the 'detailed' histogram admin endpoint used for rendering, here:
|
@ohadvano thanks for fix pr. Is there any specific release version/date this will be pushed out? |
You can either wait for the next official release, which is next month, or cherry pick the commit |
Title: OpenTelemetry stats reports histograms incorrectly
Description:
Sending envoy OpenTelemetry metrics to an OpenTelemetry collector, and using the logging exporter, I observed a histogram where the Count did not match the count of the buckets (see below). From the OTLP proto definition:
The number of bucket_counts also appears to be the same as the number of explicit bounds, rather than one greater.
Reading through the implementation, it looks like we are using computedBuckets():
envoy/source/extensions/stat_sinks/open_telemetry/open_telemetry_impl.cc
Line 136 in 2de016d
... which appears to be the count of the number below the threshold:
envoy/envoy/stats/histogram.h
Lines 65 to 70 in 2de016d
computeDisjointBuckets()
seems like it potentially does what we are looking for.envoy/envoy/stats/histogram.h
Lines 72 to 76 in 2de016d
Collector logging exporter output:
The sum of buckets is 10, but the count is 1.
Repro steps:
Run envoy configured with the OpenTelemetry stats sync and send to an OpenTelemetry collector with the logging exporter, with
logLevel: debug
to print out the OTLP.Admin and Stats Output:
Config:
Logs:
Call Stack:
cc @ohadvano
The text was updated successfully, but these errors were encountered: