OpenTelemetry stats reports histograms incorrectly

*Title*: OpenTelemetry stats reports histograms incorrectly

*Description*:

Sending envoy OpenTelemetry metrics to an OpenTelemetry collector, and using the logging exporter, I observed a histogram where the Count did not match the count of the buckets (see below).  From the [OTLP proto definition](https://github.com/open-telemetry/opentelemetry-proto/blob/ea449ae0e9b282f96ec12a09e796dbb3d390ed4f/opentelemetry/proto/metrics/v1/metrics.proto#L430):

>   // bucket_counts is an optional field contains the count values of histogram
  // for each bucket.
  //
  // **The sum of the bucket_counts must equal the value in the count field.**
  //
  // The number of elements in bucket_counts array must be by one greater than
  // the number of elements in explicit_bounds array.
  repeated fixed64 bucket_counts = 6;

The number of bucket_counts also appears to be the same as the number of explicit bounds, rather than one greater.

Reading through the implementation, it looks like we are using computedBuckets():
https://github.com/envoyproxy/envoy/blob/2de016d1007aabff202220b8177167c9ab3e8c6a/source/extensions/stat_sinks/open_telemetry/open_telemetry_impl.cc#L136

... which appears to be the count of the number below the threshold:
https://github.com/envoyproxy/envoy/blob/2de016d1007aabff202220b8177167c9ab3e8c6a/envoy/stats/histogram.h#L65-L70

`computeDisjointBuckets()` seems like it potentially does what we are looking for.
https://github.com/envoyproxy/envoy/blob/2de016d1007aabff202220b8177167c9ab3e8c6a/envoy/stats/histogram.h#L72-L76

Collector logging exporter output:

```
StartTimestamp: 1970-01-01 00:00:00 +0000 UTC
Timestamp: 2023-11-22 00:54:12.184643877 +0000 UTC
Count: 1
Sum: 375.000000
ExplicitBounds #0: 0.500000
ExplicitBounds #1: 1.000000
ExplicitBounds #2: 5.000000
ExplicitBounds #3: 10.000000
ExplicitBounds #4: 25.000000
ExplicitBounds #5: 50.000000
ExplicitBounds #6: 100.000000
ExplicitBounds #7: 250.000000
ExplicitBounds #8: 500.000000
ExplicitBounds #9: 1000.000000
ExplicitBounds #10: 2500.000000
ExplicitBounds #11: 5000.000000
ExplicitBounds #12: 10000.000000
ExplicitBounds #13: 30000.000000
ExplicitBounds #14: 60000.000000
ExplicitBounds #15: 300000.000000
ExplicitBounds #16: 600000.000000
ExplicitBounds #17: 1800000.000000
ExplicitBounds #18: 3600000.000000
Buckets #0, Count: 0
Buckets #1, Count: 0
Buckets #2, Count: 0
Buckets #3, Count: 0
Buckets #4, Count: 0
Buckets #5, Count: 0
Buckets #6, Count: 0
Buckets #7, Count: 0
Buckets #8, Count: 1
Buckets #9, Count: 1
Buckets #10, Count: 1
Buckets #11, Count: 1
Buckets #12, Count: 1
Buckets #13, Count: 1
Buckets #14, Count: 1
Buckets #15, Count: 1
Buckets #16, Count: 1
Buckets #17, Count: 1
Buckets #18, Count: 1
```

The sum of buckets is 10, but the count is 1.

*Repro steps*:
Run envoy configured with the [OpenTelemetry stats sync](https://www.envoyproxy.io/docs/envoy/latest/api-v3/extensions/stat_sinks/open_telemetry/v3/open_telemetry.proto.html) and send to an OpenTelemetry collector with the logging exporter, with `logLevel: debug` to print out the OTLP.

>**Note**: The [Envoy_collect tool](https://github.com/envoyproxy/envoy/blob/main/tools/envoy_collect/README.md)
gathers a tarball with debug logs, config and the following admin
endpoints: /stats, /clusters and /server_info. Please note if there are
privacy concerns, sanitize the data prior to sharing the tarball/pasting.

*Admin and Stats Output*:
>Include the admin output for the following endpoints: /stats,
/clusters, /routes, /server_info. For more information, refer to the
[admin endpoint documentation.](https://www.envoyproxy.io/docs/envoy/latest/operations/admin)

>**Note**: If there are privacy concerns, sanitize the data prior to
sharing.

*Config*:
>Include the config used to configure Envoy.


*Logs*:
>Include the access logs and the Envoy logs.

>**Note**: If there are privacy concerns, sanitize the data prior to
sharing.

*Call Stack*:
> If the Envoy binary is crashing, a call stack is **required**.
Please refer to the [Bazel Stack trace documentation](https://github.com/envoyproxy/envoy/tree/main/bazel#stack-trace-symbol-resolution).

cc @ohadvano


	/**
	* Returns computed bucket values during the period. The vector contains an approximation
	* of samples below each quantile bucket defined in supportedBuckets(). This vector is
	* guaranteed to be the same length as supportedBuckets().
	*/
	virtual const std::vector<uint64_t>& computedBuckets() const PURE;

	/**
	* Returns version of computedBuckets() with disjoint buckets. This vector is
	* guaranteed to be the same length as supportedBuckets().
	*/
	virtual std::vector<uint64_t> computeDisjointBuckets() const PURE;

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OpenTelemetry stats reports histograms incorrectly #31016

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

OpenTelemetry stats reports histograms incorrectly #31016

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions