Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update spec to permit negative observations of native/exponential histograms? #4155

Open
callumj opened this issue Jul 22, 2024 · 6 comments
Labels
spec:metrics Related to the specification/metrics directory triage:accepted:needs-sponsor Ready to be implemented, but does not yet have a specification sponsor

Comments

@callumj
Copy link

callumj commented Jul 22, 2024

What are you trying to achieve?

The current spec language prevents observations of negative values of Histograms:

The value is expected to be non-negative. This API SHOULD be documented in a way to communicate to users that this value is expected to be non-negative. This API SHOULD NOT validate this value, that is left to implementations of the API.

However, with the introduction of exponential histograms, negative values are supported thanks to the separation of negative and positive buckets and the removal of _sum counters being directly emitted from the application (leaving the backend to deal with these observations).

I am wondering when the spec will update to permit such a behavior?

Additional context.

Original discussion in opentelemetry-java.

@callumj callumj added the spec:metrics Related to the specification/metrics directory label Jul 22, 2024
@austinlparker austinlparker added the triage:deciding:tc-inbox Needs attention from the TC in order to move forward label Jul 23, 2024
@heyams
Copy link

heyams commented Aug 16, 2024

it looks like there was the same ask from opentelemetry-python open-telemetry/opentelemetry-python#3392

@callumj
Copy link
Author

callumj commented Aug 16, 2024

Any update to this? It seems a little heavy handed that the libraries fully support negative histograms but they're restricted by what is written in the spec - I believe there should be some leeway given to the libraries.

@tigrannajaryan tigrannajaryan added triage:accepted:needs-sponsor Ready to be implemented, but does not yet have a specification sponsor and removed triage:deciding:tc-inbox Needs attention from the TC in order to move forward labels Aug 21, 2024
@jack-berg
Copy link
Member

Discussed in the 8/21/24 TC SIG. This is something we want to see happen, but it needs a sponsor.

  • API for creating histogram needs to be updated with a new parameter indicating whether the user intends to record negative measurements
  • Need to update the histogram data model / proto representation to properly communicate that the histogram has negative measurements while preserving the monotonicity of the sum field
  • Decide how to handle histogram metrics with negative measurements in exporters like prometheus which (currently) do not support them

@ArthurSens
Copy link
Member

ArthurSens commented Sep 5, 2024

👋 -- Happy to work on this.

I'm quite new to the otel community, so I just wanted to clarify what's the expected outcome.

API for creating histogram needs to be updated with a new parameter indicating whether the user intends to record negative measurements.

Do I understand it correctly that we want to update this part of the specification? If that's correct, I wonder how safe this is for most SDKs. For Go for example, adding a new argument to a function might be considered a breaking change since one cannot simply upgrade to a new version of the SDK without compilation errors. Is that ok?

Need to update the histogram data model / proto representation to properly communicate that the histogram has negative measurements while preserving the monotonicity of the sum field

Could you elaborate a bit on what you mean with "monotonicity of the sum field"?

I was also taking a look at the proto definition and noticed how explicit it is about not allowing negative observations interfere with the sum field. Mostly to be compatible with OpenMetrics histograms. The mentioned part of the definition was written 3 years ago and a lot happened since then, Exponential Histograms finally have an equivalent in the Prometheus ecosystem (it's called Native Histograms in prometheus). Although not officialised in the OpenMetrics specification, there is work in progress in the area[1].

Should we still not allow negative observations to interfere with the sum field?

Decide how to handle histogram metrics with negative measurements in exporters like prometheus which (currently) do not support them.

Prometheus now have two types of histograms:

  • The one that has existed for years, now often called Classic Histograms, which indeed doesn't support negative measurements and there's no intention to extend this format.
  • The already mentioned Native Histograms, that follows a similar format to OTel's Exponential Histograms.

Prometheus is able to ingest Native Histograms in the Protobuf format, as long as a feature-flag is enabled during Prometheus' startup, with text format being work-in-progress at the moment. Would it make sense to tell SDK authors to always use Prometheus Native Histograms when negative measurements are allowed?

@jack-berg
Copy link
Member

👋 -- Happy to work on this.

You'll need to have a spec sponsor volunteer to sponsor you. Are any @open-telemetry/spec-sponsors interested?

If that's correct, I wonder how safe this is for most SDKs.

We have prior art to for extending APIs, and advice in the spec text regarding how to do so. Each language API would need to figure out the best way support this additional parameter in a backwards compatible manner.

Could you elaborate a bit on what you mean with "monotonicity of the sum field"?

We have competing concerns. On one hand, we aim for complete compatibility with prometheus which does not (currently) support non-monotonic sums on histograms. On the other hand, we have a recurring request to record negative measurements to histograms. If we add support for negative measurements, we need to figure out if / how to retain compatibility with prometheus. For example, we could update the prometheus compatibility to say that histograms recorded with negative measurements are skipped during translation to prometheus format.

Should we still not allow negative observations to interfere with the sum field?

If a histogram records negative measurements, but the sum field only contains the positive measurements, the result will be non-sensical. I think its probably better to omit the sum field and maybe add a new non-monotonic sum field.

Would it make sense to tell SDK authors to always use Prometheus Native Histograms when negative measurements are allowed?

Not sure. The prometheus compatibility story is probably one of the key things that needs to be worked out with this issue.

@tristan-lanfrey
Copy link

Not sure. The prometheus compatibility story is probably one of the key things that needs to be worked out with this issue.

Prometheus have some details about the not-unreasonable need for negative observations:

https://prometheus.io/docs/practices/histograms/

In those rare cases where you need to apply rate() and cannot avoid negative observations, you can use two separate summaries, one for positive and one for negative observations (the latter with inverted sign), and combine the results later with suitable PromQL expressions.

Perhaps a simple advice paraphrasing the above would assuage/tame most requests/users. It was not immediately obvious to me that the non-negative observation requirement was in fact easy to work around.

Alternatively, perhaps an automatic split of negative observations/buckets into a separate histogram with some well designed convention/naming could work if implemented by the various prometheus exporters?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
spec:metrics Related to the specification/metrics directory triage:accepted:needs-sponsor Ready to be implemented, but does not yet have a specification sponsor
Projects
Status: Spec - Accepted
Development

No branches or pull requests

7 participants