-
Notifications
You must be signed in to change notification settings - Fork 1k
fix(om2): histograms and negative observed values #2627
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
7880f71
to
bd3c521
Compare
This is only true for native histograms, but not for classic histograms. (FTR: I proposed to improve the counter reset handling for summaries and classic histograms at KubeCon Berlin in 2017. My proposal was ultimately rejected, so I guess we should not change course now and instead encourage native histograms including NHCB.) |
I've reworded the PR description and I'll copy the final text into the commit message once we agree on it. |
OM1.0 required that the Sum of Histograms is not represented when there are negative observations in a histogram. This PR is removing this requirement in OM2.0. Due to: The requirement was never implemented by the Go and Java instrumentation libraries. Enforcing it now would be breaking. The requirement makes it impossible to implement the use case where the user wants to measure the Sum anyway. We already warned users in the documentation about the possibility of Sum decreasing and not being usable for rate() 10 years ago: #43. And native histograms will not take Sum into account when calculating counter resets during rate() , thus this problem won't come up. Note: this PR does not make Sum mandatory, that is a different question. Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
bd3c521
to
3b7d783
Compare
I think the only way of solving this problem properly (beyond getting rid of classic histograms and summaries altogether) is to require PromQL to detect a counter reset in the sum via different means (historically by looking at the count, but nowadays we could also look at the CT). I don't know how to solve this given that the Prometheus community has decided to not do that. Maybe just leaving it as is in practice (which is arguably what this PR proposes) is the least bad way, but I don't feel I should make this call about OMv2. |
I agree that the solution is native histograms and this PR does not want to actually solve the problem of negative values in Sum. This PR is just about getting rid of a requirement that's not implemented by anyone and just makes things more complicated. |
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally ok with this as the above paragraph only has the sum as a SHOULD. I didn't realize it was a MUST NOT for OM 1.0, I guess that means Java is not OpenMetrics compliant today.
Also, just to note the above comment - the requirement to not expose |
noted |
Related issue about Sum allowing NaN or not: prometheus/client_golang#1275 (comment) |
We agreed to just have good PR descriptions. Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
OM1.0 required that the Sum of Histograms is not represented when there are negative observations in a histogram.
This PR is removing this requirement in OM2.0. Due to:
The requirement was never implemented by the Go and Java instrumentation libraries. Enforcing it now would be breaking.
The requirement makes it impossible to implement the use case where the user wants to measure the Sum anyway. Which means for example that you'll not be able to calculate average from Sum/Count.
The PromQL engine does not take the Sum into account when doing counter reset detection, thus it does not matter that it can decrease.We already warned users in the documentation about the possibility of Sum decreasing and not being usable for
rate()
10 years ago: PR.And native histograms will not take Sum into account when calculating counter resets during
rate()
, thus this problem won't come up.Note1: the python reference implementation did follow the requirement.
Note 2: this PR does not make Sum mandatory, that is a different question.