Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Base-2 exponential histogram protocol support #322
Base-2 exponential histogram protocol support #322
Changes from 1 commit
3904f44
e680b59
b368c47
e62badd
04072b1
1eca83f
e4f3b85
ef3d927
2535f09
962dc00
fd4b922
23d523b
6effa0f
5729183
274daa6
ba60251
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This defines the value at base^index as the upper bound of a bucket. I think lower bound is more intuitive, and computationally more convenient. "Round down" is generally more convenient in computing. For example, at positive scales, we can compute index by right shifting the exponent field in double representation.
Lower bound is more intuitive for humans. For example, bucket 0 will be [1, base) in lower bound, rather than (1/base, 1] in upper bound. In lower bound, we can also say 0 and above buckets map to 1 and above values; and negative buckets map to values less than 1. The mathematical neutral point of index space (0) and log space (1) coincide perfectly.
I known traditionally, Prometheus uses "le" upper bound buckets. But do we have to keep such compatibility? And from the current OM protocol (https://github.com/prometheus/client_model/blob/61b6c1aac0648ff14fae91ab976b8f2b30c5c5d3/io/prometheus/client/metrics.proto#L63-L73), it is not obvious if it still sticks the to upper bound semantic. Maybe @beorn7 can clarify?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't follow the argument of "more intuitive for humans". In fact, I see it the other way around.
The computational argument is minor and shouldn't tip the balance. Positive scales will be the exception, and even then, the actual difference in computational or algorithmic complexity doesn't really make a dent.
More generally (and along the lines I have explained before), this is at a level of detail that I don't feel qualified to discuss while we are still experimenting to get the basics right. Little knobs like this can easily be tweaked later in the process, based in the insights gained.
Having said that, if I had to take a guess, I would say it's likely Prometheus will stick to the "less or equal" upper bounds because it is consistent with Prometheus's existing histograms so that creating compatibility mappings is easier. There needed to be stronger arguments than "I save a single
if frac == 0.5 { sparseKey-- }
line in the code in the rare case of very low resolution histograms" to tip the balance.Finally, to avoid misunderstandings: Everything you see in a "sparsehistogram" branch in a Prometheus repo is part of our experiment. There is no direct link to OpenMetrics, i.e. this is not part of some kind of "OpenMetrics 2.0 draft". First these experiments have to be done and insights have to be gained. Only then can we think about the actual implementation in a future Prometheus release, which will in turn inform how OpenMetrics will embrace sparse high-resolution histograms.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yzhuge I was on the fence about this issue -- whether the upper or lower bound should be inclusive -- until the comment from @oertl here: https://github.com/open-telemetry/opentelemetry-proto/pull/322/files/e680b5938be129ac8410f7ca66f0a50bcc4c4b52#r681484304
You have both recommended against the zero-tolerance field described here, and the remark by @oertl shows how it can be added later in a way that is backwards compatible with exclusive lower-bound buckets. I recommend that we avoid unnecessary conflict with the Prometheus group on this and keep the "less-than-or-equal" upper bound.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First: "inclusiveness/exclusiveness is irrelevant in practice". See previous discussion at open-telemetry/oteps#149 (review) (click on "outdated" to show the comment "CharlesMasson reviewed on Mar 17"). We should avoid inclusiveness/exclusiveness constraint in OTel standard.
I don't see why bucket index (base^index as upper vs lower bound) is relevant to the zero region width. The indexed buckets are "regular" buckets that meet the bound = base ^ i formula. The zero region bucket is a special bucket for zero and values near zero. https://github.com/open-telemetry/opentelemetry-proto/pull/322/files/e680b5938be129ac8410f7ca66f0a50bcc4c4b52#r681484304 seems to be introducing a third category of bucket [zero_tolerance, lowest_regular_bound]. This will add complexity. There was a proposal of linear buckets near zero, followed by exponential buckets further out. For now, I'd like to just stick to a single zero-region bucket and a sequence of regular exponential buckets.
Previous version says: "
// zero_tolerance may be optionally set to convey the width of the
// zero region. When this is not set, implementations may assume
// the zero tolerance equals the smallest representable positive
// IEEE 754 double-precision floating point number, which is (2^−1074)."
This only "suggests" that default is 2^-1074 (smallest positive number "double" can represent). It does not define the field as "the smallest nonnegative value that maps to a regular bucket". When it is not set, most producers mean "I don't know" or "I don't care". The consumer using 2^-1074 is assuming the narrowest zero region, which may not may not be what the producer meant. The key issue with default and backward compatibility is that Protobuf default value is 0, but zero_tolerance default is "unknown". If we remap 0 to unknown, we need another value for logical 0. There are plenty of choices, like -1. But all are confusing when physical 0 is not logical 0.
A better solution is to use an addition boolean field for "zero_tolerance_is_set". Booleans default to false in protobuf. So we know it defaults to "not set" (ie. "unknown"). When it is set, we use zero_tolerance value as is (including the protobuf default of 0) for the real zero_tolerance. Logically, this achieves a default of "Null".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we need to carry the explicit flag (see flags below) for "zero_tolerance_enabled", it's somewhat akin to what we proposed for handling monotonicity of sums in histograms. I'd suggest a similar approach if we need it.
From our (Google) perspective, we have an explicit zero bucket that ends where the first exponential bucket starts, so we don't really need an explicit tolerance here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming
zero_tolerance
isn't being re-introduced anytime soon, it might be confusing to reference it from this comment.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not convinced on the value of the zero_tolerance field. From open-telemetry/opentelemetry-specification#1776 (comment):
I lean against recording the rounding threshold, even as optional field in OTel protocol. First the question is how are you going to use it, if you have it in the protocol? Besides showing the threshold to the user in some particular UI, there is not much use. And it actually creates confusion. For example, if it is larger than the lowest bucket, what does that mean (overlapping bucket boundary?)? And since protobuf optional fields default to 0, when it is not set, it looks like the threshold is 0, but in fact it may not be (the producer neglected to set it, or the producer does not know what it is). Then we may need a value for "unknown" (use "NaN"?). Tracking the threshold opens a can of worms.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am also not sure if the
zero_tolerance
field is really needed. However, if it is needed, I would rather define it as the largest nonnegative double-precision value that maps to the zero bucket. (Currently, it is defined as the smallest nonnegative value that maps to a regular bucket. ) As a consequence, the default value would be 0 and not the "magic number"2^-1074
.This definition would also be more consistent as we would have exclusive lower bounds for all buckets. Consider the first regular bucket to which values may be mapped to. With the proposed definition of
zero_tolerance
, this bucket would contain all values from the interval(zero_tolerance, base^i]
wherei
is the bucket index. Hence, we would have again an exclusive lower bound as for all other buckets which contain values from(base^(i-1), base^i]
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the suggestion by @oertl that zero_tolerance could be defined so that the zero value (default) is an inclusive upper bound.
I've removed this field from the protocol because it can be added later, by this logic, at which time the can of worms described by @yzhuge will have to be addressed. To add this field we will need to pin down more detail about how to merge histograms with different zero tolerances, for example.