Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metrics Histogram instrument default: any histogram sketch #982

Closed
jmacd opened this issue Sep 22, 2020 · 8 comments
Closed

Metrics Histogram instrument default: any histogram sketch #982

jmacd opened this issue Sep 22, 2020 · 8 comments
Assignees
Labels
area:sdk Related to the SDK priority:p1 Highest priority level release:required-for-ga Must be resolved before GA release, or nice to have before GA spec:metrics Related to the specification/metrics directory

Comments

@jmacd
Copy link
Contributor

jmacd commented Sep 22, 2020

What are you trying to achieve?

Accepting this issue means changing the metric API and SDK specification to state that "any histogram sketch" library may be chosen as the default aggregation for ValueRecorder instruments, subject to the community decision in each OpenTelemetry implementation. Note that the classic fixed-boundary histogram is a viable solution, although there will usually be a better approach that works automatically without fixed boundaries.

As stated in #636, there has been considerable debate over the default for ValueRecorder. This case was restated in #919, where the proposal was for DDSketch. From the discussion and from surveying this space a little bit more, it's clear that there are many existing libraries and algorithms that produce variable-boundary histograms. The (current) v0.5 OTLP protocol has this to say (e.g., from DoubleHistogramDataPoint):

  // explicit_bounds is the only supported bucket option currently.
  // TODO: Add more bucket options.

  // explicit_bounds specifies buckets with explicitly defined bounds for values.
  // The bucket boundaries are described by "bounds" field.
  //
  // This defines size(bounds) + 1 (= N) buckets. The boundaries for bucket
  // at index i are:
  //
  // (-infinity, bounds[i]) for i == 0
  // [bounds[i-1], bounds[i]) for 0 < i < N-1
  // [bounds[i], +infinity) for i == N-1
  // The values in bounds array must be strictly increasing.
  //
  // Note: only [a, b) intervals are currently supported for each bucket except the first one.
  // If we decide to also support (a, b] intervals we should add support for these by defining
  // a boolean value which decides what type of intervals to use.
  repeated double explicit_bounds = 7;

The next logical step is to propose that each OpenTelemetry default SDK choose the best available sketch library that is produces a histogram representation. Examples include:

HdrHistogram
DDSketch
CircllHist
UDDSketch
DynaHist

This definition, "best available sketch library", allows an SDK to fall back in two ways: (1) if there is no sketch library, a fixed-boundary histogram implementation may be a good solution, (2) if the SDK is stateless, it will be better to encode a raw value than to encode a single-value histogram.

The the OpenMetrics Histogram BucketOptions message struct, which includes Linear, Exponential, and Explicit options, is a good baseline set of supports. The discussion in #919 goes into depth about ways that the DDSketch representation can be compressed. This compression is nice-to-have, especially where we can implement it once--as in the OTel collecgtor. However, as pointed out here, there are a number of pieces of code that have to be written in each language SDK for clients to achieve this compression. This will be a lot less work if any available sketch library can be used.

Note that dealing with both positive and negative values can require special support in many of these representations. Note also there is a common case for handling zero in exponential schemes: the consequence is a bucket that has zero width. This would require changes in the definition of explicit_bounds, which doesn't permit empty buckets (as seen above).

Also discussed in #919, there is sometimes a question of whether to represent histograms with floating point or with integer counts. There are two known reasons for wanting floating-point count:

  1. Statsd "sample-rate" parameters lead to non-integer count values
  2. Sketch data structures often produce non-integer counts.

For example, the T-Digest paper section §2.9) details how to compute a cumulative distribution function from a T-Digest, which naturally converts into a histogram representation with N-1 buckets derived from N T-Digest "centroids". When converting T-Digest sketches into histograms we end up with non-integer counts.

Proposed OTLP data point

To make this proposal concrete, it calls for a third histogram data point message type. Whereas OTLP v0.5 has DoubleHistogramDataPoint and IntHistogramDataPoint, this proposal would add SketchHistogramDataPoint which has double-valued counts to support both cases mentioned above. Something like this:

message SketchHistogramDataPoint {
  // [labels & two timestamps]

  double count = 4;

  double sum = 5;

  oneof buckets {
    Explicit    explicit = 7;
    Linear      linear = 8;
    Exponential exponential = 8;
    DDSketch    ddsketch = 10;
    CircllHist  circllhist = 11;
    // ...
  }

  // [min, max, exemplars]
}

Additional context.

Although the issue of how to treat raw points is separate, this issue otherwise appears to address the many of the concerns raised in #636 as well as allow for DDSketch where available. There are several known situations where this does not necessarily give the user everything they want:

  1. When long collection intervals are used it is sometimes more desirable for the aggregation to be LastValue instead of a histogram-like sketch. In these conditions, it may be more interesting to know the last value than to know the distribution of values. This case can be potentially remedied by the user configuring a non-default aggregation, or it can be remedied by using an asynchronous instrument (i.e., ValueObserver) because its default aggregation is LastValue (Metrics: specify LastValue as default aggregation for ValueObserver instrument #834).
  2. When a Prometheus backend is used to process this data, one timeseries is recorded per histogram bucket. In this representation, variation over time can lead to a large number of buckets. Histogram-merging logic when Prometheus is the target may want to merge (interpolate) variable-boundary histogram data points into configurable-fixed-boundary histogram data points.
  3. Exact min and max values are still in demand for histogram data points. These should be made available as independent fields.

A minor note about about Summary data points (i.e., quantile-value listings). These were removed in OTLP v0.5 while we focused on that release, but this type of data point can be represented in the proposed SketchHistogramDataPoint described above, with one histogram bucket (and floating-point count) per quantile.

@jmacd jmacd added the spec:metrics Related to the specification/metrics directory label Sep 22, 2020
@andrewhsu andrewhsu added priority:p1 Highest priority level release:required-for-ga Must be resolved before GA release, or nice to have before GA area:sdk Related to the SDK priority:p2 Medium priority level and removed priority:p1 Highest priority level labels Sep 22, 2020
@yzhuge
Copy link

yzhuge commented Oct 1, 2020

I suggest that the exponential message be extended to support the “log linear” format, which basically allows an exponential bucket to be split into equal width linear sub-buckets. The log linear format is used by HdrHistogram and Circlhist. The Circlhist paper https://arxiv.org/pdf/2001.06561.pdf has detailed description of it.

The change on exponential format is very simple. From https://github.com/OpenObservability/OpenMetrics/blob/7dfd2087ed3c62ede886646c222ad15112556704/proto/openmetrics_data_model.proto#L173 , we currently have:

  message Exponential {
      // Required.
      // Must be greater than 0.
      uint32 num_exponential_buckets = 1;

      // Required.
      // Must be greater than 1.
      double growth_factor = 2;

      // Required.
      // Must be greater than 0.
      double scale = 3;
    }

All we need is a new number_of_linear_subbuckets field. It can be optional and defaults to 1 (standard exponential histogram). Because the change is so simple, I am suggesting to extend the current exponential format, rather than adding a new log linear format. With linear subbucket support, HDR histogram, CirclHist and alike can compress the bound field to just a few parameters. The large number of Hdr Histogram users will welcome such a feature and may adopt Open Telemetry more quickly.

Related question, “dealing with both positive and negative values can require special support in many of these representations”. May I suggest that an optional 2nd histogram be allowed for negative numbers when the format is exponential? This is a common solution for negative numbers when using exponential histograms

@yzhuge
Copy link

yzhuge commented Oct 2, 2020

Looking at #919 again, it appears that DDSketch has several value to bucket mapping methods. The canonical one uses standard log algorithm, the others use approximation, for lower cpu cost. The various mapping methods can be found at https://github.com/DataDog/sketches-java/tree/master/src/main/java/com/datadoghq/sketch/ddsketch/mapping (the link from issue 919 did not work, but I managed to find the classes in the repo). The main sketch class has constructors to choose a method.

The canonical method should be able to produce a standard exponential histogram. The only sticky point may be that DDSketch allows bucket collapsion, where multiple buckets (typically at the lower end) may be merged to save space. However, if the format defines a starting (0, minTrackedValue) bucket and an ending (maxTrackedValue, +infinity) bucket, the consumer of the message can determine if collapsion happened. In a no-collapsion case, the counter at both ends should be 0. With collapsion, one or both ends' bucket will have non-zero count (some data fell into the open ended catch-all buckets). This format will support collapsion on one or both ends, and communicate collapsion info from histogram producer to consumer.

@jmacd
Copy link
Contributor Author

jmacd commented Oct 15, 2020

@yzhuge I am following your suggestion above about a number_of_linear_subbuckets field. The Circllhist paper diagrams are great at explaining this 2-parameter histogram with B=logarithm-base and N=number-of-linear-subbuckets, but what's appealing to me about this is the potential for compatibility with Prometheus, which uses

DefBuckets = []float64{.005, .01, .025, .05, .1, .25, .5, 1, 2.5, 5, 10}

This is approximately a B=10/N=4 encoding (matching the Circllhist diagram), except it's really a N=3-out-of-4 encoding scheme in which the 1..10 interval is divided into 4 linear buckets and then the last two are collapsed.

Note that both the B=10/N=4 and B=10/N=90 boundaries can be correctly collapsed into the default Prometheus N=3-out-of-4 scheme. For example, a B=10/N=90 histogram has 15 buckets covering .1 to .25, 25 buckets covering .25 to .50, and 50 buckets covering 0.50 to 1.0. We can convert either of these into Prometheus histograms without loss of accuracy or precision.

This leads to thinking about either the B=10/N=4 or B=10/N=90 boundaries by default, but I'm not interested in support for general parameterized histograms (e.g., a B=3/N=2 encoding has boundaries at 1, 2, 3, 6, 9, 18, 27, 54, 81, ... and does not interest me). Note that we could specify this encoding without specifying how these histograms are computed.

@jmacd
Copy link
Contributor Author

jmacd commented Oct 15, 2020

I made a logic error in the analysis above, but it's minor. The Prometheus scheme can be generated from a B=10/N=18 bucketing scheme, where the buckets are collapsed in groups of 3, 5, and 10 to synthesize the 0.1/0.25/0.5/1 boundaries.

@githomin
Copy link

Looking at #919 again, it appears that DDSketch has several value to bucket mapping methods. The canonical one uses standard log algorithm, the others use approximation, for lower cpu cost.

We'd like to note that this proposal would cut off OT users from with the faster and more memory-efficient mappings such as DDSketch's CubicallyInterpolatedMapping (about 3 times less costly than a basic exponential mapping, with only 1% space overhead) or DynaHist's LogQuadraticLayout

The canonical method should be able to produce a standard exponential histogram. The only sticky point may be that DDSketch allows bucket collapsion,

This won't be a problem. There is no special handling of collapsing is taken into account in our protocol as well. It should also be noted that collapsing is purely a safe-guard and should almost never happen as shown in our paper.

@yzhuge
Copy link

yzhuge commented Oct 22, 2020

See histogram update PR prometheus/OpenMetrics#152
The PR went to the wrong repo. Reworking it on the correct repo.

@andrewhsu andrewhsu added priority:p1 Highest priority level and removed priority:p2 Medium priority level labels Oct 23, 2020
@yzhuge
Copy link

yzhuge commented Oct 26, 2020

@jmacd jmacd changed the title Metrics ValueRecorder instrument default: any histogram sketch Metrics Histogram instrument default: any histogram sketch Jul 23, 2021
@jmacd
Copy link
Contributor Author

jmacd commented Sep 14, 2021

The metrics SDK specification reads "select the best Histogram Aggregation available":

https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/metrics/sdk.md#histogram-aggregation

@jmacd jmacd closed this as completed Sep 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:sdk Related to the SDK priority:p1 Highest priority level release:required-for-ga Must be resolved before GA release, or nice to have before GA spec:metrics Related to the specification/metrics directory
Projects
None yet
Development

No branches or pull requests

5 participants