Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Base-2 exponential histogram protocol support #322

Merged
merged 16 commits into from
Sep 17, 2021
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ Full list of differences found in [this compare.](https://github.com/open-teleme

### Added

* Remove if no changes for this section before release.
* ExponentialHistogram is a base-2 exponential histogram described in [OTEP 149](https://github.com/open-telemetry/oteps/pull/149).

### Removed

Expand Down
138 changes: 138 additions & 0 deletions opentelemetry/proto/metrics/v1/metrics.proto
Original file line number Diff line number Diff line change
Expand Up @@ -221,6 +221,16 @@ message Histogram {
AggregationTemporality aggregation_temporality = 2;
}

// ExponentialHistogram represents the type of a metric that is calculated by aggregating
// as a ExponentialHistogram of all reported double measurements over a time interval.
message ExponentialHistogram {
jmacd marked this conversation as resolved.
Show resolved Hide resolved
repeated ExponentialHistogramDataPoint data_points = 1;

// aggregation_temporality describes if the aggregator reports delta changes
// since last report time, or cumulative changes since a fixed start time.
AggregationTemporality aggregation_temporality = 2;
}

// Summary metric data are used to convey quantile summaries,
// a Prometheus (see: https://prometheus.io/docs/concepts/metric_types/#summary)
// and OpenMetrics (see: https://github.com/OpenObservability/OpenMetrics/blob/4dbf6075567ab43296eed941037c12951faafb92/protos/prometheus.proto#L45)
Expand Down Expand Up @@ -431,6 +441,134 @@ message HistogramDataPoint {
repeated Exemplar exemplars = 8;
}

// ExponentialHistogramDataPoint is a single data point in a timeseries that describes the
// time-varying values of a ExponentialHistogram of double values. A ExponentialHistogram contains
// summary statistics for a population of values, it may optionally contain the
// distribution of those values across a set of buckets.
//
message ExponentialHistogramDataPoint {
// The set of key/value pairs that uniquely identify the timeseries from
// where this point belongs. The list may be empty (may contain 0 elements).
repeated opentelemetry.proto.common.v1.KeyValue attributes = 1;

// StartTimeUnixNano is optional but strongly encouraged, see the
// the detiled comments above Metric.
jmacd marked this conversation as resolved.
Show resolved Hide resolved
//
// Value is UNIX Epoch time in nanoseconds since 00:00:00 UTC on 1 January
// 1970.
fixed64 start_time_unix_nano = 2;

// TimeUnixNano is required, see the detailed comments above Metric.
//
// Value is UNIX Epoch time in nanoseconds since 00:00:00 UTC on 1 January
// 1970.
fixed64 time_unix_nano = 3;

// count is the number of values in the population. Must be non-negative. This
// value must be equal to the sum of the "count" fields in buckets if a
jmacd marked this conversation as resolved.
Show resolved Hide resolved
// histogram is provided.
fixed64 count = 4;

// sum of the values in the population. If count is zero then this field
// must be zero. This value must be equal to the sum of the "sum" fields in
// buckets if a histogram is provided.
//
// Note: Sum should only be filled out when measuring non-negative discrete
jmacd marked this conversation as resolved.
Show resolved Hide resolved
// events, and is assumed to be monotonic over the values of these events.
// Negative events *can* be recorded, but sum should not be filled out when
// doing so. This is specifically to enforce compatibility w/ OpenMetrics,
// see: https://github.com/OpenObservability/OpenMetrics/blob/main/specification/OpenMetrics.md#histogram
jmacd marked this conversation as resolved.
Show resolved Hide resolved
double sum = 5;

// scale describes the resolution of the histogram. Boundaries are
// located at powers of the base, where:
//
// base = (2^(2^-scale))
//
// The histogram bucket identified by `index`, a signed integer,
// contains values that are less than or equal to (base^index) and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This defines the value at base^index as the upper bound of a bucket. I think lower bound is more intuitive, and computationally more convenient. "Round down" is generally more convenient in computing. For example, at positive scales, we can compute index by right shifting the exponent field in double representation.

Lower bound is more intuitive for humans. For example, bucket 0 will be [1, base) in lower bound, rather than (1/base, 1] in upper bound. In lower bound, we can also say 0 and above buckets map to 1 and above values; and negative buckets map to values less than 1. The mathematical neutral point of index space (0) and log space (1) coincide perfectly.

I known traditionally, Prometheus uses "le" upper bound buckets. But do we have to keep such compatibility? And from the current OM protocol (https://github.com/prometheus/client_model/blob/61b6c1aac0648ff14fae91ab976b8f2b30c5c5d3/io/prometheus/client/metrics.proto#L63-L73), it is not obvious if it still sticks the to upper bound semantic. Maybe @beorn7 can clarify?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't follow the argument of "more intuitive for humans". In fact, I see it the other way around.

The computational argument is minor and shouldn't tip the balance. Positive scales will be the exception, and even then, the actual difference in computational or algorithmic complexity doesn't really make a dent.

More generally (and along the lines I have explained before), this is at a level of detail that I don't feel qualified to discuss while we are still experimenting to get the basics right. Little knobs like this can easily be tweaked later in the process, based in the insights gained.

Having said that, if I had to take a guess, I would say it's likely Prometheus will stick to the "less or equal" upper bounds because it is consistent with Prometheus's existing histograms so that creating compatibility mappings is easier. There needed to be stronger arguments than "I save a single if frac == 0.5 { sparseKey-- } line in the code in the rare case of very low resolution histograms" to tip the balance.

Finally, to avoid misunderstandings: Everything you see in a "sparsehistogram" branch in a Prometheus repo is part of our experiment. There is no direct link to OpenMetrics, i.e. this is not part of some kind of "OpenMetrics 2.0 draft". First these experiments have to be done and insights have to be gained. Only then can we think about the actual implementation in a future Prometheus release, which will in turn inform how OpenMetrics will embrace sparse high-resolution histograms.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yzhuge I was on the fence about this issue -- whether the upper or lower bound should be inclusive -- until the comment from @oertl here: https://github.com/open-telemetry/opentelemetry-proto/pull/322/files/e680b5938be129ac8410f7ca66f0a50bcc4c4b52#r681484304

You have both recommended against the zero-tolerance field described here, and the remark by @oertl shows how it can be added later in a way that is backwards compatible with exclusive lower-bound buckets. I recommend that we avoid unnecessary conflict with the Prometheus group on this and keep the "less-than-or-equal" upper bound.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First: "inclusiveness/exclusiveness is irrelevant in practice". See previous discussion at open-telemetry/oteps#149 (review) (click on "outdated" to show the comment "CharlesMasson reviewed on Mar 17"). We should avoid inclusiveness/exclusiveness constraint in OTel standard.

I don't see why bucket index (base^index as upper vs lower bound) is relevant to the zero region width. The indexed buckets are "regular" buckets that meet the bound = base ^ i formula. The zero region bucket is a special bucket for zero and values near zero. https://github.com/open-telemetry/opentelemetry-proto/pull/322/files/e680b5938be129ac8410f7ca66f0a50bcc4c4b52#r681484304 seems to be introducing a third category of bucket [zero_tolerance, lowest_regular_bound]. This will add complexity. There was a proposal of linear buckets near zero, followed by exponential buckets further out. For now, I'd like to just stick to a single zero-region bucket and a sequence of regular exponential buckets.

Previous version says: "
// zero_tolerance may be optionally set to convey the width of the
// zero region. When this is not set, implementations may assume
// the zero tolerance equals the smallest representable positive
// IEEE 754 double-precision floating point number, which is (2^−1074)."

This only "suggests" that default is 2^-1074 (smallest positive number "double" can represent). It does not define the field as "the smallest nonnegative value that maps to a regular bucket". When it is not set, most producers mean "I don't know" or "I don't care". The consumer using 2^-1074 is assuming the narrowest zero region, which may not may not be what the producer meant. The key issue with default and backward compatibility is that Protobuf default value is 0, but zero_tolerance default is "unknown". If we remap 0 to unknown, we need another value for logical 0. There are plenty of choices, like -1. But all are confusing when physical 0 is not logical 0.

A better solution is to use an addition boolean field for "zero_tolerance_is_set". Booleans default to false in protobuf. So we know it defaults to "not set" (ie. "unknown"). When it is set, we use zero_tolerance value as is (including the protobuf default of 0) for the real zero_tolerance. Logically, this achieves a default of "Null".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we need to carry the explicit flag (see flags below) for "zero_tolerance_enabled", it's somewhat akin to what we proposed for handling monotonicity of sums in histograms. I'd suggest a similar approach if we need it.

From our (Google) perspective, we have an explicit zero bucket that ends where the first exponential bucket starts, so we don't really need an explicit tolerance here.

// greater than (base^(index-1)).
//
// The positive and negative ranges of the histogram are expressed
// separately but both use the same scale to locate their boundaries.
jmacd marked this conversation as resolved.
Show resolved Hide resolved
//
// scale is valid in the range -4 through +8 inclusive.
jmacd marked this conversation as resolved.
Show resolved Hide resolved
sint32 scale = 6;

// zero_count is the count of values that are either exactly zero or
// within the region considered zero by the instrumentation at the
// tolerated degree of precision. This bucket stores values that
// cannot be expressed using the standard exponential formula as
// well as values that have been rounded to zero. Users have the
// option to set zero_tolerance to convey additional information about

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming zero_tolerance isn't being re-introduced anytime soon, it might be confusing to reference it from this comment.

// the width of the zero region.
//
// Implementations MAY consider the zero bucket to have probability
// mass equal to (zero_count / sum).
jmacd marked this conversation as resolved.
Show resolved Hide resolved
fixed64 zero_count = 7;

// zero_tolerance may be optionally set to convey the width of the
// zero region. When this is not set, implementations may assume
// the zero tolerance equals the smallest representable positive
// IEEE 754 double-precision floating point number, which is (2^−1074).
//
// Producers can be configured to round measurements to zero that fall
// below the zero tolerance. Users should avoid producing measurements
// that have meaningless precision near zero, as exponential buckets
// become increasingly dense in this region.
//
// Implementations MAY consider the zero bucket to have probability
// density equal to (zero_count / 2*zero_tolerance), if set.
double zero_tolerance = 8;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not convinced on the value of the zero_tolerance field. From open-telemetry/opentelemetry-specification#1776 (comment):
I lean against recording the rounding threshold, even as optional field in OTel protocol. First the question is how are you going to use it, if you have it in the protocol? Besides showing the threshold to the user in some particular UI, there is not much use. And it actually creates confusion. For example, if it is larger than the lowest bucket, what does that mean (overlapping bucket boundary?)? And since protobuf optional fields default to 0, when it is not set, it looks like the threshold is 0, but in fact it may not be (the producer neglected to set it, or the producer does not know what it is). Then we may need a value for "unknown" (use "NaN"?). Tracking the threshold opens a can of worms.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am also not sure if the zero_tolerance field is really needed. However, if it is needed, I would rather define it as the largest nonnegative double-precision value that maps to the zero bucket. (Currently, it is defined as the smallest nonnegative value that maps to a regular bucket. ) As a consequence, the default value would be 0 and not the "magic number" 2^-1074.

This definition would also be more consistent as we would have exclusive lower bounds for all buckets. Consider the first regular bucket to which values may be mapped to. With the proposed definition of zero_tolerance, this bucket would contain all values from the interval (zero_tolerance, base^i] where i is the bucket index. Hence, we would have again an exclusive lower bound as for all other buckets which contain values from (base^(i-1), base^i].

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the suggestion by @oertl that zero_tolerance could be defined so that the zero value (default) is an inclusive upper bound.

I've removed this field from the protocol because it can be added later, by this logic, at which time the can of worms described by @yzhuge will have to be addressed. To add this field we will need to pin down more detail about how to merge histograms with different zero tolerances, for example.


// positive carries the positive range of exponential bucket counts.
SparseBuckets positive = 9;

// negative carries the negative range of exponential bucket counts.
SparseBuckets negative = 10;

// SparseBuckets are a sparse set of bucket counts, compactly encoded.
message SparseBuckets {
// Span encodes a run of contiguous histogram buckets.
message Span {
sint32 offset = 1; // Gap to previous span, or starting point for 1st span (which can be negative).
jmacd marked this conversation as resolved.
Show resolved Hide resolved
uint32 length = 2; // Length of consecutive buckets.
}

// span is a repeated list of Spans that encodes a range of
// exponential buckets, positive or negative. The offset of the
// first Span indicates the smallest index in the Histogram with
// non-zero count. Subsequent contiguous buckets will be included
// in the same Span, and subsequent Spans encode additional
// buckets separated by gaps where the histogram has zero count.
jmacd marked this conversation as resolved.
Show resolved Hide resolved
//
// The starting index of a Span equals its own offset plus the
// starting index of the previous Span plus the previous Span's
// length. As a recursive formula:
//
// starting_index(span[x+1]) =
jmacd marked this conversation as resolved.
Show resolved Hide resolved
// starting_index(span[x]) +
// span[x+1].offset +
// span[x].length
//
// For example, the two Spans [ {offset: 10, length: 3}, {offset:
// 5, length: 8} ] indicate a total of 11 buckets with indices at
// [10, 11, 12] and [18, 19, 20, 21, 22, 23, 24, 25].
repeated Span span = 1;

// delta is an array of counts corresponding to the buckets
// described by `span`, encoded as deltas. The indicated count
// for bucket[x] is defined as a recursive formula:
//
// bucket_count(x+1) = bucket_count(x) + delta[x+1]
repeated sint64 delta = 2;
jmacd marked this conversation as resolved.
Show resolved Hide resolved
jmacd marked this conversation as resolved.
Show resolved Hide resolved
}

// (Optional) List of exemplars collected from
// measurements that were used to form the data point
repeated Exemplar exemplars = 8;
jmacd marked this conversation as resolved.
Show resolved Hide resolved
}

// SummaryDataPoint is a single data point in a timeseries that describes the
// time-varying values of a Summary metric.
message SummaryDataPoint {
Expand Down