Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Referendum on Histogram format #1776

Closed
jmacd opened this issue Jun 24, 2021 · 49 comments
Closed

Referendum on Histogram format #1776

jmacd opened this issue Jun 24, 2021 · 49 comments
Assignees
Labels
spec:metrics Related to the specification/metrics directory spec:protocol Related to the specification/protocol directory

Comments

@jmacd
Copy link
Contributor

jmacd commented Jun 24, 2021

What are you trying to achieve?

This issue aims to achieve consensus from OpenTelemetry maintainers and approvers over a choice of histogram formats.

There has been a lengthy process to standardize an exponential histogram protocol for OpenTelemetry, culminating in OTEP 149 which arrived at a consensus on exponential histograms.

Contemporaneously, the OpenHistogram code base was re-licensed (formerly the Circonus Log-linear Histogram), making it an appealing option for both OpenTelemetry (and OpenMetrics) to use, but it is not the same as an exponential histogram. The goal of this issue is to summarize these options to allow a wider audience to participate in the selection process.

It is important that we choose only one of these options for several reasons: (1) because there is a loss of fidelity whenever we translate between incompatible histogram types, (2) because there will be a large amount of code dedicated to handling each of these types in many downstream vendor and OSS systems.

OTEP 149: Exponential histograms

OTEP 148, discussed and debated in open-telemetry/oteps#149, lists considerations around specifying an exponential histogram for OTLP. In an exponential histogram, boundaries are located at integer powers of a specific base value. If the base value is 1.1, then there are boundaries at 1.1, 1.1^2, 1.1^3, 1.1^4, and so on.

Merging exponential histograms

While this form of histogram is almost trivial in concept, there was an early concern about loss of fidelity when merging arbitrary histograms, due to residual "artifacts". This consideration leads in two directions, both of which have been explored.

The first avenue is to fix the histogram parameters completely--when there are no parameters then you can merge histograms without loss of fidelity. The second avenue is to choose a parameterization scheme that has "perfect subsetting", which is when buckets of a high-resolution histogram perfectly map into buckets of a lower-resolution histogram.

Perfect subsetting for exponential histograms

Following UDDSketch and unpublished work at Google, OTEP 149 lands on the idea of a powers-of-2 exponential histogram, one that ensures perfect subsetting.

scale boundaries, e.g. number of buckets spanning 1..100
0 1, 2, 4, 8 7
1 1, 1.4, 2, 2.8, 4 14
2 1, 1.2, 1.4, 1.7, 2, 27
3 1, 1.09, 1.19, ... 1.7, 1.83, 2 54
4 1, 1.04, 1.09, ... 1.83, 1.92, 2 107
5 1, 1.02, 1.04, ... 1.92, 1.96, 213
6 1, 1.01, 1.02, ... 1.92, 1.96, 426

OpenHistogram: Log-linear histograms

OpenHistogram uses a base-10 exponential scheme with 90 buckets linearly dividing each "decade", so for example there are 90 buckets between 0.1 and 1, 90 buckets between 1 and 10, 90 buckets between 10 and 100, and so on. This approach maps well to human intuition about logarithmic scale. This approach is also tailored for environments without floating point processors.

Merging OpenHistogram histograms

OpenHistogram has fixed parameterization, thus avoids loss of fidelity when merging.

Perfect subsetting for OpenHistogram-like histograms

The goal of perfect subsetting is to select parameters that support merging of high- and low-resolution histograms. OpenHistogram makes a strong case for the use of 90 buckets per decade, which is relatively high resolution, because any factor of 9 ensures integer boundaries >= 1 in a base-10 scheme.

Resolution factors that are compatible with OpenHistogram and have perfect subsetting:

resolution boundaries, e.g. number of buckets spanning 1..100
1 1, 10 2
3 1, 4, 7, 10 6
9 1, 2, 3, ... 8, 9, 10 18
18 1, 1.5, 2, ... 9, 9.5, 10 36
90 1, 1.1, 1.2, ... 9.8, 9.9, 10 180
180 1, 1.05, 1.1, ... 9.9, 9.95, 10 360

Note that while OpenHistogram fixes resolution at 90, OpenTelemetry could adopt a protocol with support for other resolutions and still adopt OpenHistogram libraries for its SDKs, allowing metrics processing pipelines to automatically lower resolution for collected metrics.

Protocol considerations

Regardless of which choice is made for the options above above, there are several choices remaining.

Sparse vs. Dense encoding

A dense encoding is optimized for the case where non-empty buckets will be clustered together, making it efficient to encode a single offset and one (if exponential) or two (if log-linear) arrays of counts.

A sparse encoding is optimized for the case where non-empty buckets are not clustered together, making it efficient to encode every bucket index separately.

Sparse histograms can be converted into denser, lower-resolution histograms by perfect subsetting; bucket indexes are are sometimes compressed using delta-encoding techniques.

Zero bucket handling

The zero value must be handled specially in an exponential histogram. There is a question about how to recognize values that are close to zero, whether they "fall into" the zero bucket for example.

Converting from other histogram formats

Both the exponential and log-linear family of histogram are expected to improve the resolution-per-byte of encoded data that is achieved, relative to the explicit-boundary histogram currently included in OTLP. Metrics processors that operate on this data will require helper methods to assist in translating from other histogram formats into the exponential histogram that we choose.

To translate from another histogram format, often we use interpolation. Rules for interpolating between histogram buckets should specify how to handle the buckets that span zero and buckets with boundaries at infinity, since both are valid configurations. To interpolate from arbitrary-boundary buckets, we have to calculate bucket indexes for each boundary in the input.

Calculating bucket indexes: Exponential case

To calculate the bucket index for an arbitrary exponential base generally means calculating a logarithm of the value.

For the special case of base-2 exponential histograms, the IEEE 754 floating point representation has the data in the correct format, it can be directly extracted using bitwise operations.

To calculate bucket indexes without floating point hardware, a recursion relationship can be used.

Calculating bucket indexes: Log-linear case

OpenHistogram defines a recursion relation for calculating the bucket index.

Summary

Thank you to the experts that have guided us to this point. @yzhuge, @postwait, @oertl, @githomin, @CharlesMasson, @HeinrichHartmann, and @jdmontana.

This is now a question for the community. There are two major options presented here, a base-2 exponential histogram and a base-10 log-linear histogram. Both have technical merits.

There are also non-technical merits to consider. OpenHistogram is readily available and has already been adopted in a number of OSS systems, such as Envoy. An out-of-the-box Prometheus client uses 12 buckets with default boundaries that map exactly onto OpenHistogram boundaries, which cannot be said for the binary-exponential approach.

My personal opinion (@jmacd): I am in favor of adopting a protocol with support for OpenHistogram's histogram as long as the protocol supports the lower resolution factors listed above, 3, 9, 18, which appears to be a trivial extension to the OpenHistogram model. I am assuming this approach will be legally acceptable as far as the OpenHistogram project is concerned.

@jmacd jmacd added spec:metrics Related to the specification/metrics directory spec:protocol Related to the specification/protocol directory labels Jun 24, 2021
@oertl
Copy link

oertl commented Jun 25, 2021

@jmacd thanks for your effort and your comparison. I just miss a discussion about the space-efficiency:

One of the main reasons to use histogram-like sketches for the summarization of distributions is that they provide strict error guarantees when estimating quantiles. (This is in contrast to other sketches like KLL or t-digest). If you want to estimate a certain quantile, histograms allow you to determine the bucket which contains that quantile value. The corresponding bucket boundaries will obviously bracket that quantile and can be used for estimation. The maximum relative estimation error is therefore dependent on the maximum relative bucket width (relative distance between subsequent boundaries).

Assume you want to estimate quantiles with a relative accuracy of 2.5%. This would roughly require a maximum relative bucket width of 5%, if the mid-point of the bucket is used as estimate. When using the base-2 solution, you would therefore have to choose scale=4 with a relative bucket width of 4.4%. According to the first table above, the number of buckets needed to cover the range from 1 to 100 is 107.

To have the same maximum relative error when using OpenHistogram, you would have to choose resolution=180. This configuration results in the desired maximum relative bucket width of 5% (given by the first 2 bucket boundaries listed in the table). The second table above shows that 360 buckets are needed to cover the range from 1 to 100, which is more than a factor of 3!!! more compared to the base-2 solution. Since space-efficiency and the estimation of quantiles is important for us, we favor the base-2 solution.

@CharlesMasson
Copy link

We also think that computation cost and space efficiency should be driving the choice of the bucket index calculation, and in this regard, a base-2 solution makes more sense to us.

It is also not very clear to us why a base-10 approach is more tailored for environments without floating point processors. Base-2 log-linear bucket indexes can also be computed without floating point operations (using the number of leading zeros and the trailing part of the binary representation of integers, as we use the exponent and the mantissa of the binary representation of floating-point values). Unless I'm mistaken, that is essentially what HDR histograms do.

In addition, while a base 10 might be more intuitive, we do not think that end users, especially non-technical people, who see the output of such histogram-like sketches through the quantiles that they produce, or the visualizations that are built from them, should have to reason about it. Most visualizations (histograms, heatmaps, etc.) generally require bucket remapping anyway (e.g., into constant-width buckets), and this can be achieved satisfactorily using an appropriate remapping method (e.g., allocating counts based on bucket overlapping), independently of the internal bucket index calculation. Even when interfacing with the bucket index mapping is necessary, we found out that a proper bucket mapping abstraction (which properly exposes the bucket index a value belongs to, the boundaries of a bucket of a given index, etc.) allows abstracting away the internal index calculation method.

@HeinrichHartmann
Copy link

HeinrichHartmann commented Jun 25, 2021

Space Efficiency

The second table above shows that 360 buckets are needed to cover the range from 1 to 100, which is more than a factor of 3!!! more compared to the base-2 solution -- @oertl

This is a good observation. I just re-ran the math and come to the same conclusion (cf. tables below). In principle the relative accuracy is inverse-proportional to the bucket count in 1-100, irrespective of the exact bucket layout (more buckets => more accuracy). The phenomenon we are running into is that the relative size of the bucket at the low end of the linear scale is much larger than on the high end of the linear scale: (1.0 .. 1.1) has ~10% the size of the bucket boundary, whereas (9.9 ... 10) as ~1%-the size of the bucket boundary. Hence OpenHistogram needs more buckets to bound the worst-case error.

OpenHistogram and Floating Point

It is also not very clear to us why a base-10 approach is more tailored for environments without floating point processors. -- @CharlesMasson

There are two properties that make inserting data into OpenHist work well without floating-point math:

  • The linear-part of the scale only requires integer divisions to determine the bucket no logarithms
  • Base-10 allows trivial conversion between SI units (n,u,m,k,M,G), which is relevant if you are inserting, e.g. nano-seconds, micro-seconds, seconds, etc. into your histogram.

Here is a pointer to the C implementation: https://github.com/openhistogram/libcircllhist/blob/master/src/circllhist.c#L793
This code compiles to BPF and can be used this for in-kernel latency measurements (-> blog) - where you don't have floats.

Base-10 and Human Readability

In addition, while a base 10 might be more intuitive, we do not think that end users, ..., should have to reason about it. -- @CharlesMasson

The most significant advantage for the end-users of using base-10 (and linear sub-binning) is that you get exact upper/lower counts for the human-readable thresholds. This is relevant for, e.g. for Latency SLOs, where you want to know which percentage of requests was faster than a human-set threshold (e.g. 100ms) over the last month. With resolution=9 OpenHistograms you can cover all practically relevant SLO thresholds (100ms, 200ms, ...) and get exact results.

Accuracy Tables

@jmacd -- If you add those error columns to the tables in the description, I'll remove these tables here.

OTEP-149

scale base buckets in 1..100 Max relative error
0 2.000 7 33.33%
1 1.414 14 17.15%
2 1.189 27 8.63%
3 1.091 54 4.35%
4 1.044 107 2.15%
5 1.022 213 1.09%
6 1.011 426 0.55%

OpenHistogram

resolution bucket-boundary* buckets in 1..100 Max relative error
1 10 2 81.82%
3 4 6 60.00%
9 2 18 33.33%
18 1.5 36 20.00%
90 1.1 180 4.76%
180 1.05 360 2.44%

*) for the worst-case bucket with boundary [1,x)

@giltene
Copy link

giltene commented Jun 27, 2021

As consideration for histogram formats is evolving here, both for new formats and existing ones, it's probably worth looking at HdrHistogram and it's established data structure and wire-form as one possible format to use.

HdrHistogram has been around for roughly 10 years now, as a pure (and permissive) OSS thing, and is currently supported with community-created implementations in Java, JavaScript, Python, Go, C/C++, C#/.NET, Rust, and Erlang. It has evolved from a super-fast log-linear Java implementation to a common data structure and compact (compressed) wire format that is shared across multiple language implementations. It has support for both integer and floating point histogram forms, as well as sparse or packed in-memory representations in various implementations. The wire format is specifically decoupled from the in-memory representation, allowing end points to make local tradeoffs of e.g. speed vs memory footprint (e.g. a packed histogram and a non-packed one have the same wire format), or recording time vs iteration/quantile calculation time (packed representations are slower to record into, but faster to iterate and compute on). The wire form itself has proven to be quite compact for real-world applications, and it's BLOB is commonly shared/stored in single (Base64 or binary) string when not being actively recorded into (see example in e.g. a jHiccup log where each line represents a fully recorded histogram capturing 1 second of (~1000) recorded samples).

It would be interesting to see what the footprint (in bytes in a wire form used in e.g. a database or data transmittal) looks like for the various histogram format choice alternatives, as well as what basic metrics like recording speed/cost, memory footprint, and integration time look like.

@oertl
Copy link

oertl commented Jun 28, 2021

I did a recording speed benchmark a month ago. The results show that the proposed base-2 solution (RecordingSpeedBenchmark.recordDynaHistDynamicOpenTelemetryExponentialBuckets) can be as fast as HdrHistogram.DoubleHistogram. If a static array is used (RecordingSpeedBenchmark.recordDynaHistStaticOpenTelemetryExponentialBuckets) it is even faster. As always, benchmark results need to be used with a lot of caution. @giltene, please help in improving this benchmark in case you think I should configure DoubleHistogram in a different way to have a more fair comparison.

Regarding space efficiency, log-linear bucketing has an inherent space overhead and needs roughly 44% more buckets to limit the maximum relative error over a certain value range compared to an ideal bucketing as we have already discussed 6 years ago (see HdrHistogram/HdrHistogram#54). Reducing this space overhead was actually one of the main motivations to develop DynaHist (https://github.com/dynatrace-oss/dynahist).

@giltene
Copy link

giltene commented Jun 28, 2021

I’ll take a look at the benchmark. But regarding space efficiency, I think that space spent in the wire form (serialized) and in various in memory implementations (packed vs flat) would lead to very different comparisons. The wire forms of HdrHistogram, for example, takes advantage of the sparse nature of real world data to dramatically reduce footprint while still maintaining the same bounded value error characteristics. Using a combination of ZigZag, Run Length encoding, and compression, the actual BLOBs transmitted and stored are extremely compact , to the point where they can (literally) be tweeted. The same concept is used for the in-memory packed forms (see e.g. PackedDoubleHistogram). These techniques are obviously not limited to HdrHistogram, and I assume that similar packing and compression techniques can be used for other forms of log-linear and exponential (small base, as in 1.01) histograms, but the post-packing footprint differences between the base layouts will likely be insignificant, and the work on this in HdrHistogram is already done.

@giltene
Copy link

giltene commented Jun 28, 2021

Some thoughts on comparing possible encodings and formats:

  • We should probably start with some baseline data sets to use. Stuff recorded from actual systems is much preferred to synthetically generated sets. E.g. get our hands on some actual response time or latency (per operation) data and use those sets of data for comparison.

  • Comparing speed: there are multiple operations for which speed can/should be compared:

    • Recording speed. This is the fundamental metric the data gatherers will care about, as this will be their "performance tax". "How great it needs to be" can widely vary by application. E.g. endpoints that record 100s of operations per second will car less about this than ones that record 100s of thousands of operations per second.
    • Data extraction speed (e.g. extracting quantiles): this matter more for the backend than the front ends.
    • Ecoding and decoding speed: measuring the cost of serializing/deserializing data to/from wire and storage formats.
  • Comparing footprint: There are multiple forms of footprint, and they should all be compared sin actual data sets.

    • "wire form" (aka serialized form) footprint: The size of the data "at rest", in either binary or text-friendly (e.g. Base64 encoded json BLOBs) forms.
    • "in memory" footprint: there are potentially multiple forms for this:
      • Speed-optimized footprint (e.g. flat array HdrHistogram)
      • Space-optimized in-memory footprint (e.g. Packed HdrHistograms)

@jmacd
Copy link
Contributor Author

jmacd commented Jun 28, 2021

Thank you for joining this discussion, @giltene!

The HDR Histogram documentation focuses a lot on the Java API, so it's not easy to find a specification for the wire format.
Will you please help us in providing a link to a summary of the HDR histogram wire format?

Because OTLP uses Google protobufs, OpenTelemetry will (I assume, speaking for myself) prefer to the interpret histogram data in a .proto representation, and "HDR Histogram" compatibility would mean being able to convert to and from a traditional HDR Histogram representation without loss and without significant requiring computation.

@jmacd
Copy link
Contributor Author

jmacd commented Jun 28, 2021

@giltene
Copy link

giltene commented Jun 28, 2021

@jmacd a specification of the integer histogram encoding can currently be found here as part of the .NET implementation. DoubleHistograms are not specified there, but since DoubleHistograms use integer histograms internally, they differ from the integer variant only in the cookie and header parts [an implementation can be found here, and yes, a common spec would be better].

As to interaction with protobuf and/or other storage/transmission formats, I'd expect histograms to show up a a "histogram BLOB" in any such protocol. with a specified internal format, and for there to be histogram encoders/decoders in various languages for whatever the format chosen is. There are many advantages to doing so (detached from the encompassing protocol), including a dramatic footprint impact, and portability across tools and such.

@yzhuge
Copy link

yzhuge commented Jun 29, 2021

Space cost

Log linear costs more than exponential at the same quantile relative error. And the larger the base, the higher the cost. As pointed out in other comments, at the relative error of a few percent, base2 log linear costs 44% more than base2 scaled exponential, base10 log linear costs 3x more than base2 scale exponential.

On the client side (an app being monitored), 1kB vs. 3kB might not look like much in today's systems of Gigabytes of memory (even on a cell phone). But on the server side (the backend that stores and queries the data), with billions of metric data points, 3x memory is a lot of $$$.

Cpu cost

What is most important is the value to bucket mapping, which is in the critical path of inserting a value to a histogram. Other parts, like bucket management, are largely portable across different mapping methods. For dense arrays, bucket addressing can be as simple as array indexing.

Base10 log linear

For base10 (OpenHistogram) value to bucket mapping, there is a significant cost of binary to decimal conversion, because input is always binary, for integer and floating point. An example can be found at int_scale_to_hist_bucket(), where a loop of integer division is used (https://github.com/openhistogram/libcircllhist/blob/master/src/circllhist.c#L807). And in double_to_hist_bucket(), where a log10() call is used (https://github.com/openhistogram/libcircllhist/blob/master/src/circllhist.c#L831)

Base2 log linear

Base2 log linear is very fast, when using exponent and mantissa from the native “double” representation. An example is https://github.com/DataDog/sketches-java/blob/master/src/main/java/com/datadoghq/sketch/ddsketch/mapping/BitwiseLinearlyInterpolatedMapping.java#L60, which is essentially just a few bit wise operations.

Base2 scaled exponential

Base2 scaled exponential can be made nearly as fast as base2 log linear. As shown in this example from @oertl lookup table

int i = indices[(int) (mantissa >>> (52 - precision))];
int k = i + ((mantissa >= boundaries[i]) ? 1 : 0) + ((mantissa >= boundaries[i + 1]) ? 1 : 0);
return (exponent << precision) + k + indexOffset;

Here “indices” and “boundaries” are precomputed arrays of 2^scale entries each. The core computation involves a few bitwise operations, plus 3 array lookups. The only method that is faster is base2 log linear. But that costs 44% more buckets at the same relative error. Other methods in the log or log linear family won’t even get close.

I had previously thought about this and came up with a similar method, where the arrays have 2^(scale+1) entries, so that each linear bucket represented by the arrays spans at most 2, instead of 3 log scale buckets, so that it is not necessary to look at boundaries[i + 1]. This further reduces cpu cost, at the expense of static array size. An example of the enhanced method (equivalent to computing "k" in the example above):

public long getSubBucketIndex(final long mantissa) {
    final int linearArrayIndex = (int) (mantissa >>> linearMantissaShift);
    final int logBucketIndex = logBucketIndexArray[linearArrayIndex];
    return mantissa >= logBucketEndArray[linearArrayIndex] ? logBucketIndex + 1 : logBucketIndex;
}

The lookup methods essentially use precomputed lookup tables to quickly map a value from a linear bucket into a log bucket. At the scale of interest, where relative error is a few percent, lookup array size is totally manageable (the arrays are static arrays shared across all instances in a process).

scale base buckets in 1..100 Max relative error lookup array size enhanced lookup array size
3 1.091 54 4.35% 8 16
4 1.044 107 2.15% 16 32
5 1.022 213 1.09% 32 64

Log linear has long been used as a fast approximation of pure log scale buckets. But now with lookup tables, pure log scale can be computed nearly as fast as base2 log linear, yet cost 44% less memory. And it is easy to understand and implement. Everybody understands log scale, and when performance is not critical, a simple implementation can call standard log() function to avoid the lookup table logic.

Use cases

Let's consider the pros and cons of the options in various use cases:

Histogram charting

Histogram charting in linear scale

In this case, base10 log linear does have the advantage of bucket bounds aligning up with decimal numbers. However, at the relative error of a few percent, hundreds of buckets are needed even for a small range of 1..100. With this many buckets, individual buckets are hard to see on a curve. Unless a user is “bucket peeking” (as in pixel peeking on an image), they will hardly notice the underlying buckets’ bounds.

This comes to the most significant advantage of a base10 histogram: more human friendly when a human is actually eyeballing the individual buckets. But this use case is more limited to scenarios like debugging than a common end user use case (Can you imagine a business person scrolling through hundreds of histogram buckets?).

Histogram charting in log scale

If the chart is rendered in log scale, a pure log scale histogram such as scaled base2 actually has advantage over a log linear histogram, because a pure log scale histogram’s bucket are of equal width on log scale, while log linear buckets are not.

SLO measurement

A typical SLO is like: 99% of requests must be processed in less than 100ms.
There are two ways to check if SLI meets the SLO:

  1. Check that 99% percentile of response time is less than 100ms
  2. Check that the percentage of requests shorter than 100ms is higher than 99%.

For method 2, base10 log linear can give the exact answer when the target ms falls on bucket bound, which is expected to be common because people set the target. But this method is less intuitive in practice. Suppose the SLI says 97% when SLO is 99%. You know 2% of requests are longer than desired, but not how much longer. And it is hard to explain to the user the precision of the system. Business people are not going to like a statement like “if the target ms is on the series of 100, 105, 110 …, 1000, 1050, 1100 …, the result percentage has zero error, otherwise, the error is variable (data dependent)”.

Mathematically, “variable (data dependent)” is the accurate statement. If the requested ms falls into a bucket of 5% total population, the error is +/- 2.5% (absolute error on percentage, assuming the query returns the mid point of the bucket). If it’s in a bucket of 10% population, the error is +/- 5%. For this reason, you might as well just say there is no error guarantee. Alternatively, you can return two numbers, both with zero error, like “the percentage of requests faster than 103ms is between 91.62% and 97.34%”. But the user friendliness of such a system is not going to be good.

Method 1, the percentile method is more user friendly. For example, SLI says 99% is 127ms, when SLO is 100ms. You know you are 27ms away. In this use case, it is important to give the user the upper bound of relative errors. For example, if the histogram has an error bound of 2%, the user knows that 127ms is meaningful against the 100ms target.
For method 1, base10 has no advantage over base2. Users are used to measurement not on “whole numbers”.

Closing thoughts

Survey questions are full of pitfalls. If you ask users: would you like your histogram bounds to be on decimal? They are going to say yes. But if you add: for 3x more money, many will say otherwise. And even when money is not an object, decimal comes with log-linear scale, which has its own pros and cons.

Back to the beginning:
In open-telemetry/opentelemetry-proto#226, I actually started by adding both log and log linear format, with log for space optimization and log linear for cpu optimization. But now with lookup tables for scaled base2 histograms (a pure log histogram), we no longer have to make a choice between cpu and space. We can have both.

Another lesson I learned from that PR is that log linear is harder to explain. You would think “linear subbuckets in log buckets” is easy to understand, but I had to explain it to people again and again, while everybody instantly understood log scale histograms. Log scale (exponential) histogram is mathematically well understood and with many implementations. As long as an implementation can set the base, it can produce scaled base2 histograms (just set base to a base from the scaled series).

@giltene
Copy link

giltene commented Jun 29, 2021

On space cost: I think once should compare the "packed form" space costs rather than the "faster flat array" forms. By "packed form", I am referring to in-memory or wire formats that take advantages of the sparse nature of the data and of the tendency to not need to the full fidelity of e.g. 64 bit counts for each element.
A PackedDoubleHistogram will be significantly more compact in space than the unpacked forms of both base2 log linear and exponential. The similar packed approach can probably be applied to exponential as well, and the wire form for exponential can obviously be compressed in a similar way. In both wire formats and in in-memory packed forms, I would expect to see little difference between packed base2 log linear and packed exponential with a base that results in a similar value error level. In reality, the space taken up by the packed representations will be dominated by the actual unique values recorded (unique to within the allowed value error levels), and that number will be roughly the same for the same error level.

On base2 vs base10: I see no real benefit for base 10 when a value error level is expected. The values captured by either will be the same to within the value error level (of e.g. 1%), and for every case where someone is "happy" with a record 10.0 showing up as exactly 10.0 there will be someone who is "upset" with some other number not exhibiting that quality. Since real world recorded numbers don't tend to align to precise powers of 10, the nicety of base10 is not likely to be relevant IMO.

@giltene
Copy link

giltene commented Jun 29, 2021

BTW, for details about the packed arrays I've designed to hold in-memory representations of log linear histograms, see e.g. the external API here and detailed comments about the internal structure here.

While these packed arrays (useable for both concurrent and single threaded operations) were designed in the context of HdrHistogram, I believe they can be generically used for other things that need an "...array context is optimized for tracking sparsely set (as in mostly zeros) values that tend to not make use of the full 64 bit value range even when they are non-zero." As such, an exponential histogram implementation that normally maps it's internal math to a flat physical array can probably be enhanced to use the same packed array implementation to provide a virtual array of the same size, achieving nearly identical packing levels and footprints.

@yzhuge
Copy link

yzhuge commented Jun 29, 2021

One more thing to compare is the scale interval. For scaled based 2, it is regular, with base[scale + 1] = squareRoot(base[scale]), where "scale" is continuous in integer range.

For base10 log linear, the "precision" series is uneven, ranging from 2x, 3x, to 5x change from one to the next. In general the series alternates between 2x and 5x to keep the 10x decimal ratio: 9, 18, 90, 180, 900, 1800 ..., in order for any two precisions to be mergeable. This gives 2 uneven stops between each 10x of precision. The resulting error series is roughly:
50%, 25%, 5%, 2.5%, .5%, .25%, .05% ...

@CharlesMasson
Copy link

On space cost: I think once should compare the "packed form" space costs rather than the "faster flat array" forms.

@giltene Shouldn't the bucket mapping on the one hand, and the internal/wire representation and tracking of bucket counts on the other hand be two distinct problems that we can independently optimize for? Correct me if I'm wrong, but as is the case with DDSketch or DynaHist, it seems HDR histograms also involve mapping the input value to a bucket index, then updating the count at the given index. I agree that there is often more to gain by optimizing the bucket count representation rather than the index mapping itself (as we experienced in DDSketch), but in any case, it seems to me that any implementation would benefit from a non-negligible 44% space efficiency uplift thanks to a more optimal index mapping that would minimize the number of distinct indexes to keep track of.

@yzhuge For exhaustiveness, even though we may not want to favor them if simplicity matters, we could also mention polynomially interpolating mappings (log-linear is a special case). They offer better space efficiency than log-linear at a slightly higher CPU cost. Their CPU cost may be higher than the logarithmic method with precomputed sub-indexes though.

Base2 log linear is very fast, when using exponent and mantissa from the native “double” representation.

I believe that any method that involves bitwise operations using the exponent and the mantissa from the binary representation of a floating-point value should also work on integer values without conversion by using the number of leading zeros (many architectures natively support computing it) and the first non-zero bits of their binary representations.

@gartaylor
Copy link

I think there are two separate questions here bundled into the choice as presented.

  1. Do we want log or log-linear histograms?
  2. Do we want these aligned to base 10 or base 2?

Independent of performance considerations I believe in general customers would

  1. Want Log histograms as they are simpler to reason about and provide uniform measurement uncertainty
  2. Have a slight preference for base 10 as they would get more precision at boundaries they would naturally use

In terms of performance the conversion from user space to histogram space happens on the end client. This may matter to some users but only really if they are recording around of a million measurements per second (at least on mainstream architectures) . As discussed above, at least for base 2, there are optimization techniques that can be used to make log about as performant as log linear in this regard.

On the storage and query side I think the number of bins that have to be stored and queried will matter to end users as they will in the end have to pay for it. It is worth bearing in mind that current time series storage formats are using a couple of bits per datapoint for counters and gauges, storing 40% more bins per histogram datapoint is going to be significant.

Between the two choices presented I believe customers would prefer log histograms with base-2.

@giltene
Copy link

giltene commented Jun 29, 2021

@CharlesMasson about bucket mapping being a separate consideration than internal/wire representation: “yes and no”:

  • Yes because as long as the “bucket mapping” translates a value to an index in a physical array, we can seek to compare the number of buckets in each scheme and compare their needs and behavior,
  • No because a packed array representation changes the fundamental considerations of what matters for space consumption. With a packed array representation, the index that the mapping creates is a virtual index that then gets translated to a physical index in an array optimized for both efficient and fast access. For data that exhibits the qualities that packed arrays were designed for (sparsely populated virtual indexes that tend to not use the full 64 bit fidelity even for non-zero values), the physical footprint requirements are likely to be very similar for the various schemes (when compared for similar value error levels on the same data sets) even when their virtual footprint requirements differ. The virtual size difference between the mappings is mostly in the “amount of wasted sparse space” in the various schemes, and since wasted sparse space is compressed away by packed representations, it’s impact on physical footprint greatly diminishes.
    (note that this is an “in theory” statement. The way to tell for sure is to compare packed array based implementations on identical “real world” data sets. But since right now only HdrHistogram has packed array implementations, we’ll have to wait for an exponential implementation that is expanded to use PackedArray to actually measure).

Packed representations do come at a cost. But for context (in some trivial one-shot u-bench testing), the current (HdrHistogram) PackedDoubleHistogram seems to clock in at ~18M recordings/sec on a single thread on a 2.4GHz x86 cpu, on the same data test that DoubleHistogram clocks in at 90M/sec for. So a 5x hit in CPU, but still super-fast. In tests that include a single writer Recorder pattern (which adds overhead for coordinating double buffering, but is the most likely pattern to be actually used by performant recorders) this packed representation overhead ratio drops to ~2:1. So “comes with some very real overhead, but still darn fast” is how I’d describe it. And unless you actually record ~1M value per second per thread, they will probably all seem the same. With a memory footprint savings of 1-3 orders of magnitude (data set dependent, precision dependent), packed in memory representations are likely to be the ones more commonly used over time IMO, and this is probably true regardless of the index mapping scheme chosen.

Assuming this (packed representations dominating over time) is true, and assuming my assertion that in packed representations of the different index mapping schemes at similar value error levels the physical footprint becomes very similar is also true, memory footprint is not going to be the main consideration in choosing the format, because the (packed, physical) memory footprint will end up being very similar for the various alternatives.

@jmacd
Copy link
Contributor Author

jmacd commented Jul 7, 2021

@oertl and @pirgeo I've been looking at the prototype mentioned here: open-telemetry/oteps#149 (comment)

I would like your assistance with a bit more discovery into the direct use of the IEEE floating point representation to compute a histogram bucket. Suppose I want to create a histogram aggregator that uses fixed precision (a size parameter, say 64 buckets) and adapts the value of scale dynamically to cover the input range, keeping bucket index values that can be represented in a machine word. My goal is to auto-configure low-resolution histograms based on their allowed space, and let the scale be chosen automatically.

Suppose I give you a single 64-bit floating point value, the first to be inserted into the histogram. I would like to see (in pseudo-code, at least) the algorithm for determining:

  1. What is the maximum scale possible such that the index + precision does not overflow a 64-bit signed integer?
  2. What is the bucket index of the first value at the chosen scale?

The goal is to show that this can be done using the IEEE representation plus primitives to compute the number of leading and/or trailing zeros in a value.

The next step (again, asking for pseudo-code) would be to handle new values:

  1. If a new value falls outside the range covered by [index, index+precision) at the current scale, select the best possible scale to cover the new range and collapse buckets appropriately
  2. Locate the correct bucket index for the new value and increment the bucket count.

I feel that the above pseudo-code will demonstrate that the proposed base-2 exponential scheme is flexible enough to support both high resolution and low resolution and make use of the IEEE representation to its advantage. Would you help flesh this out?

@jmacd
Copy link
Contributor Author

jmacd commented Jul 7, 2021

@open-telemetry/specs-approvers

To summarize the discussion above, we have an OTEP and strong support for exponential histograms with base-2 bucketing. There is ongoing consideration into two log-linear alternatives, one base-10 (OpenHistogram) and one base-2 (HDRHistogram). We are also trying to end a debate about sparse vs dense encodings.

This was discussed in the Tuesday July 6 Metrics SIG meeting and we concluded with a call to see prototypes. The DynaHist prototype (mentioned in my preceding comment, above) will be presented next Tuesday, and I've asked for pseudo-code to help demonstrate the benefit of choosing base-2 for low-resolution histograms.

Our goal is to choose one "family" of histogram encodings, and community support is what matters most. We have several community supported Histograms we could choose, but the community that matters most in this decision is Prometheus. We are not ignoring the work done on sparse, exponential histograms in Prometheus here, but the communities have been mostly separate.

I would like to invite @beorn7, who has been working a branch that is compatible with the base-2 exponential histogram in OTEP 149. @beorn7 would you please post a copy of the current .proto that you would like to see?

@beorn7
Copy link

beorn7 commented Jul 8, 2021

Most people involved here probably already know, but just to be sure: The approach of adding sparse dynamic high-resolution histograms to Prometheus is described in a design doc, which also explains why Prometheus (with its model of stateless scrapes) cannot just use an existing histogram implementation as a drop in. (But general principles of implementing histograms still apply, of course.)

The current implementation progress (which is still early PoC state) can be followed in the sparsehistogram branch of the relevant repositories:

Now the latter is a bit delicate: Prometheus 2.x dropped the traditional protobuf scrape format, even though I had vague plans of better histograms for Prometheus using protobuf already back then. (If you want to have a laugh listening to a grumpy old man complaining about the course of history, watch my FOSDEM talk about the subject.) For the PoC, we decided to resurrect the old protobuf format and extend it for the new sparse high-res histograms. That shall not imply that this old protobuf format is coming back. More likely, a future or extended version of OpenMetrics will accommodate the new histograms, maybe just in its protobuf version, or we will find a good way of representing them in the text-oriented format, too.

Long story short: The semantics of the current PoC exposition format is the intended one, but the details are definitely not the final form. Having said that, these are the relevant new lines in metrics.proto:

As explained in the design doc, this attempts to represent a (moderately) sparse histogram in a form that is both compact and easy to en-/decode. It is not meant to be an explicit representation that would be easy to parse for humans. So far, the experience in the prototype are quite good. I understand, however, that the trade-offs for OTel might be quite different. What I have seen so far in the OTEP looks like it will be easy to convert back and forth. Also note that it's our intention to keep Prometheus open for other bucket layouts (marked by currently unused values of the sb_schema field and possibly additional fields in the protobuf). However, we really want a default "one-size-fits-it-all" mode, following the same line of argument as here and in the OTEP: It should always be mergeable to the common lowest resolution, and most users shouldn't need to configure anything.

@jmacd
Copy link
Contributor Author

jmacd commented Jul 9, 2021

@yzhuge I am interested in your thoughts on #1776 (comment).

It looks like the protocol buffer messages written by @beorn7 are compatible with the ones you wrote, with a new detail about the width of the zero bucket. Having worked on code that converts from explicit-boundary to exponential-boundary histograms, I support the notion that the zero bucket has finite width, being the bucket between the smallest positive and negative boundaries.

A little bit of text will have to be written about how to merge histograms with different zero bucket widths. One reason for my earlier question (#1776 (comment)) is that I want to understand what the limits (on scale and bucket index) imply for the zero bucket.

@yzhuge
Copy link

yzhuge commented Jul 9, 2021

Scaled base2 exponential histograms can handle any small number (or large number) encoded by double. When number of buckets exceeds configured max, it is preferable to downscale to preserve constant relative error across the full range, rather than to "cut off" at the high end (overflow), or at the low end (underflow). A zero bucket with a width like [0, x] is effectively an underflow bucket. I prefer not to allow such buckets in OTel standards. Overflow/underflow buckets open a can of worms. They are often put in as a means to limit number of buckets in producers not supporting downscaling. In scaled producers, scaling is preferred over overflow buckets.

Another reason to favor scaling over cut off is that it is hard to decide the overflow and underflow threshold. In one app, .001 may be a good underflow threshold, yet in another, 10E-9 would be. So the threshold runs counter to our goal of no/minimal config. In contrast, the scaling approach is totally automagic.

To convert histograms with overflow buckets into pure log scale (no overflow buckets), we will have to twist bucket semantic. A bucket like (0, x] or [x, +Inf] can be treated as [x, x] (all values exactly at x). The changes semantic of the bucket. But quantile calculation is goofy (with unbounded error) in the overflow/underflow range anyway, whether it is computed from the original histogram or the converted one. An example of such calculation can be found at https://prometheus.io/docs/prometheus/latest/querying/functions/#histogram_quantile

@beorn7 what do you think of not using underflow bucket (of course, we will always have a bucket for values exactly at 0)?

@giltene
Copy link

giltene commented Jul 9, 2021

@jmacd @yzhuge I think that kicking around the question of whether or not the notion of a zero bucket width is actually needed may be useful before settling on things.

Specifically, in my previous experience, I found that auto-ranging eliminates the need for specifying either teh covered range or a zero-bucket width, allowing for a truly zero-config (other than precision, which could default to e.g. +/-1% or better) histograms. The notion basically auto-ranges (as values are recorded) e.g. a 1 : 2^55 dynamic range within the wide 2^-1023 to 2^1023 range possible through Double, it is also very performant, as the number of auto-ranging transitions is capped (to no more than e.g. 55 in a 1:2^55 range) and each transition is fairly cheap (only needs to actually touch a precision-capped set of buckets). In effect, the width of the zero bucket auto-adjusts according to the lowest non-zero value actually experienced.

Auto-ranging can probably be logically separated form the choice of log-linear vs. exponential, and implemented for either. In my log-linear (powers of 2 for both log and linear) implementation in HdrHistogram, auto-ranging is achieved by logically shifting the underlying integer array value representation. Logical shifting is effectively free for all values except those recorded in the the lowest logarithmic half-bucket in a left shift operation. Power-of-two log-linear representation made the scaling (logical shifting) operation needed for auto-ranging both easy and fast, but there is probably a logical equivalent that can be used in the other representations.

For an example of how auto-ranging logic works, you can look at the autoAdjustRangeForValueSlowPath logic in HdrHistogram, which uses shiftCoveredRangeToTheRight and shiftCoveredRangeToTheLeft to adjust the covered range. You can drill down from there for details, but the "interesting" specific logic for handling for the lowest logarithmic half bucket in a left shift can be found here.

@beorn7
Copy link

beorn7 commented Jul 9, 2021

whether non-zero values that are around zero are "noise" or "signal" is in the eye of the beholder.

Yes, that's why you can configure it if need be.

@beorn7
Copy link

beorn7 commented Jul 9, 2021

I am sorry to say that I don't have the capacity for a detailed discussion here about what Prometheus should or shouldn't do now. I also don't think that's the purpose of this issue.

We (Prometheus folks) are merely implementing a PoC right now, based on plenty of discussions we had earlier. At some point, we just had to draw a line and start trying things out. Based on the experiences from the PoC and on whatever people might bring up in the meantime, things can be revisited extra carefully before anything is released into production with stability guarantees. But right now, we would just like to implement the agreed design and see how it works.

Personally, I think a standard should be built upon something that has already proven to work well in real-world production scenarios (rather than creating a standard on the drawing board, for which implementations are created after the fact). From the Prometheus side, we aren't even there yet (when it comes to sparse high-res histograms), so I would anyway recommend to take anything we do right now with a grain of salt. I'm happy to provide pointers to what we are doing, in case it helps OTel to sketch out their path, but I cannot reboot the design process at this point.

@yzhuge
Copy link

yzhuge commented Jul 10, 2021

When we are talking about numbers in the range of E-308 (double min) or E-39 (Prometheus noise reduction threshold), we are in hair splitting land. I propose a no-cost fuzzy solution:
OTel will still have a bucket for zero. But states that producers are allowed to round near zero numbers to zero and throw them into this bucket. For simplicity, OTel protocol will not record the rounding threshold. In other words, whether rounding happens in the clock measuring time, or in the histogram producer that does noise reduction, OTel does not differentiate them. OTel does not record or transport info such as clock accuracy or noise reduction threshold. If such info is important to an app, it will have to handle them outside of OTel. Of course, if there is enough demand, such info could be added to OTel in the future.

@jmacd
Copy link
Contributor Author

jmacd commented Jul 13, 2021

I am trying to find a middle ground on this issue about the zero bucket, and I think there is one.

@beorn7 wrote

For any special cases, you can configure the width to something else.

@yzhuge wrote

OTel will still have a bucket for zero. But states that producers are allowed to round near zero numbers to zero and throw them into this bucket. For simplicity, OTel protocol will not record the rounding threshold.

@giltene wrote

Specifically, in my previous experience, I found that auto-ranging eliminates the need for specifying either the covered range or a zero-bucket width

For me the middle ground is:

  1. Producers can decide what is their own threshold for zero compared with very-small values
  2. Producers do not need to encode their threshold in the protocol.

In other words, we can specify that the zero bucket should be used for values that the producer considers effectively zero subject to their own threshold.

@beorn7 it seems we all agree that producers should decide their zero threshold. You didn't explicitly mention why the protocol should encode that value for the consumer to see. The consumer will see the smallest value that is observed greater than the threshold, which conveys information about the tolerance anyway. Is there a great loss of information by not conveying the zero threshold?

@jmacd
Copy link
Contributor Author

jmacd commented Jul 13, 2021

Thanks to @pirgeo for presenting the DynaHist algorithm mentioned above (#1776 (comment)) in today's Metrics SIG meeting.

He presented a version of the same code written in Go, here.

@jmacd
Copy link
Contributor Author

jmacd commented Jul 13, 2021

@giltene I am interested in finding a way that OpenTelemetry SDKs can take advantage of HDRHistogram implementations and generate a format that is compatible with the exponential sparse histogram encoding by @beorn7 shown here.

I read over the HDR encoding description mentioned here and the Go implementation here. It seems that common HDRHistogram implementations can be configured to compute pure exponential histograms by selecting the appropriate precision for the maximum range. It takes a few reads through the New() method to understand the relationship between minimum and maximum representable values and precision.

The Java HDR implementation is more sophisticated and also supports auto-ranging. If we aim to support auto-ranging (which appears to be an extremely popular idea), then it appears reasonable to drop the linear component of the bucket index, since the auto-ranging is done using exponential scale. To me, this suggests that HDRHistogram implementations could be adapted to output to proposed format with little effort.

@giltene let us know what you feel will be lost to produce the following bucket encoding (which is @beorn7's encoding without the zero bucket width)?

message Buckets {
  sint32 scale = 1; // According to OTEP 149, boundaries at (2^(2^-scale))^(index)
  uint64 zero_count = 2; // Count in zero bucket.
  SparseBuckets negative = 3; // Negative sparse buckets.
  SparseBuckets positive = 4; // Positive sparse buckets.
}

message SparseBuckets {
  message Span {
    optional sint32 offset = 1; // Gap to previous span, or starting point for 1st span (which can be negative).
    optional uint32 length = 2; // Length of consecutive buckets.
  }
  repeated Span span    = 1;
  repeated sint64 delta = 2; // Count delta of each bucket compared to previous one (or to zero for 1st bucket).
} 

@giltene
Copy link

giltene commented Jul 13, 2021

@jmacd
In two parts:

On being able to convert HdrHistogram to the proposed sparse high res (exponential) histogram:

Data from HdrHistogram can probably easily be converted to other histogram formats, and as long as the error-bound/resolution/precision parameters are in similar ranges, the additional error imposed in the conversion would be the only concern. Sos e.g. converting from a n HdrHistogram configured with a "precision" (in HdrHistogram terms) of two decimal points (i.e. any two values that are more than 1% apart are distinguishable from each other) can be probably converted to a proposed sparse high resolution histogram with a "resolution" (ins parse hi-res histogram terms) of 100, with an additional loss of precision of no more than 1% (since both representations allow up to a 1% error in value, they can add up to 2%). [Since the proposed scheme would only support resolutions that are whole powers of two, you'd need a resolution of 128 to make that human-comforting statement using round % numbers correct)

[Note: I've updated the math below with more explanation and specific examples]
BTW, for comparison for real world 1% precision expectation (i.e. any two values that are >1% apart remain distinct, which is the samosa a maximum relative error of 0.5%), the two encodings probably have very similar footprints:

  • In log-linear powers-of-2, A two decimal point precision is implemented using 128 linear sub buckets per logarithmic bucket (with the 128 sub buckets only covering the upper half of each bucket because of overlap). So e.g. the value range between 2.0 and 4.0 will be covered by 128 linear buckets that are each 0.015625 wide. This provides a maximum relative error of 0.390625% (half the bucket width), which is (being below 0.5%) is enough to guarantee "any two values that are >=-1% apart are distinct".
  • [to my understanding, which may be wrong] To achieve the same "resolution" (such that any two values that are more than 1% apart will be distinguishable from each other), the proposed sparse bucket exponential encoding would (I think) need to use a scale of 7 (which would place index boundaries at 2^(1/128) ratios from each other, and achieve that 1% precision). So e.g. the range between 2.0 and 4.0 would be covered by 128 exponential buckets, each being 2^(1/128) as wide as the one that precedes it, with the widest bucket (that ends at 4.0) being [4.0 - 4.0*(2^(1/128)] wide , which is ~0.0216023 wide, or. This provides a maximum relative error of ~0.2715% which is (being below 0.5%) is enough to guarantee "any two values that are >=-1% apart are distinct".

On auto-ranging implications:

I don't follow your conclusion on auto-ranging and dropping of the linear component (in a log-linear representation). There is no need to drop the linear component or lose resolution/precision when auto-ranging.

In the log-linear representation, Auto ranging (in my implementation at least) involves a logical "shifting" of the meaning of the underlying values represented in all buckets and linear sub-buckets, e.g. a shift left means that all values now have a meaning of 2x the value they were previously recorded under. This shift is then coupled with a matching normalization change (of the factor between the underlying value and the represented value) to maintain the values, but shift where they are logically represented in an array of buckets.

As long as auto-sizing (a change in the array size) is not involved, the "shift" itself does not have to move any memory around, as it just changes the logical base offset in a circular array representation, so all we do is change where "array index 0" actually starts (and the array wraps around physical 0). But when autosizing is needed (which is likely to be the case if we seek zero config), it will involve the addition of a logarithmic bucket (with all of it's linear subbuckets) to the array, and some of the values in the array will likely need to be moved around since the "inserted" set of linear sub-buckets could fall in the middle of the array. So it is actually auto-sizing, and not auto-ranging that ends up copying memory around. The good news is that the number of auto-ranging or auto-sizing steps is capped and small (no more than one per power of 2 in the eventual dynamic range of the numbers recorded into the histograms, and actually less (by e.g. 7) due to the linear components).

If what you meant is that there is not strict need for linear sub bucketing to perform auto-scaling (so the "drop" of the linear component refers to not needing it in an exponential representation, and not to the linear part of log-linear], you are right. A log-linear underlying value storage design that uses powers of 2 for both exponential bucket boundaries and for the linear sub-bucket boundaries seemed natural for this shift-based auto-ranging and auto-sizing logic. [I just use a linear values histogram for the underlying values, since a normalization factor is applied anyway to support auto-ranging]. But as mentioned before, I expect that auto-ranging and auto-sizing can be done in logically-similar ways in other representations; I just haven't thought through the mechanical efficiencies of doing so [I assume logic that involves some amount of contents copying and changes to normalization factors can achieve similar effects in an exponential high resolution histogram as well].

@beorn7
Copy link

beorn7 commented Jul 14, 2021

it seems we all agree that producers should decide their zero threshold. You didn't explicitly mention why the protocol should encode that value for the consumer to see. The consumer will see the smallest value that is observed greater than the threshold, which conveys information about the tolerance anyway. Is there a great loss of information by not conveying the zero threshold?

It's certainly true that the zero bucket will never contain any observations larger than lower bound of the bucket closest to zero (and mirrored correspondingly for negative observations). However, with a sparse encoding, there might be many many empty buckets in between the zero bucket and the populated bucket closest to zero (which is in fact the most common case in our PoC studies so far, if the zero bucket is used at all). So while your assumption is correct, it also overestimates the size of the zero bucket. That's probably fine in cases like DDSketch where (IIRC) expanding the zero bucket is used as a strategy to reduce the bucket count, but it hits the other end of the spectrum, with the most extreme case being that the zero bucket only takes observations of precisely zero (as also suggested here).

@giltene
Copy link

giltene commented Jul 14, 2021

Continuing on my [slightly edited/updated comment above]

BTW, for comparison for real world 1% precision expectation (i.e. any two values that are >1% apart remain distinct, which is the same as a maximum relative error of 0.5%), the two encodings probably have very similar footprints:

One can debate the precision levels needed, and if +/-0.5% error rate is needed. I find that to be the most commonly stated level. It "feels high resolution" and meets this intuitive, seems-to-make-sense to humans level for "the value you get out will be no more than 1% away from the value you put in". Somehow people tend to accept that much better than "no more than 1.5625% away from each other"... It seem to be "a strange number" and "not as good as 1%" (their mind for not-mathematically-sound intuitive interpretation of "what decimal position will the error appear in" tend to jump to the next decimal order of magnitude, which would be 10%). Said differently, when you give people a choice of precision levels, they tend to choose numbers like 10%, 1%, 0.1%. [hence the original (and debatable) choice to state precision in terms of decimal points accuracy in the HdrHisatogram API] And when you ask what they mean by that, they generally mean that they are good dropping (or rounding) the decimal positions that are beyond the level they stated.

Beyond just power of tens, there other "seem natural to humans" levels for "no more than X% apart", which tend to placate their "but the number I put in is not the same as what is reported" complaints include the "round numbers" of 2 and 5. E.g. people will often think in terms of 1,2,5,10,20,50,100,200,1000,... when looking for naturally stated exponentially increasing levels at a finer-than-10x stepping rate. This numbers seem to be easier for people to intuitively reason about and deal with (where 16, 64, or 27.1828 seem like "strange boundaries" that are "neither there nor there" to most non-bit-thinking or non-mathematicians).

We can look at the number of buckets needed to provide a given "no more than X% apart" for various human0natural levels, and at the error rates provided at each:

(I'm hoping/assuming my math on these is actually correct...)

"no more than X% apart" log-linear buckets log-linear error exponential buckets exponential error
10% (i.e. +/- 5%) 16 per 2x range +/- 3.125% 8 per 2x range +/- 4.344 %
5% (i.e. +/- 2.5%) 32 per 2x range +/- 1.5625% 16 per 2x range +/- 2.172 %
2% (i.e. +/- 1%) 64 per 2x range +/- 0.78125% 64 per 2x range +/- 0.543%
1% (i.e. +/- 0.5%) 128 per 2x range +/- 0.390625% 128 per 2x range +/- 0.2715%
0.5% (i.e. +/- 0.25%) 256 per 2x range +/- 0.1953125% 256 per 2x range +/- 0.13575%
0.2% (i.e. +/- 0.1%) 512 per 2x range +/- 0.09765625% 512 per 2x range +/- 0.067875%
0.1% (i.e. +/- 0.05%) 1024 per 2x range +/- 0.048828125% 1024 per 2x range +/- 0.0339375%

As this table presumably shows, while exponential does provide a tighter maximum relative error at any bucket number (because it's maximum relative error is uniformly distributed across the buckets), the number of buckets needed to meet a typical human expectation boundary is only better at the lower resolutions (10%, 5%) but ends up being the same for no more than {2%, 1% 0.5%, 0.2%, and 0.1%} apart. [Exponential will likely win again at 0.01%, but there are few applications that go there].

Note that these bucket numbers are all in "virtual bucket count" sense. Sparse and/or packed representations (like the one being proposed above for OTEP- 149, HDRHistograms's existing wire form and in-memory packed array representations, and lots of other wire and in-0memory representations for various schemes) will mostly only "pay" for the physical non-zero-count buckets. But when the virtual bucket counts are the same, the number of non-zero buckets that result in recordings is likely to be "similar" (there will likely be differences in either direction because of the bucket boundaries not being the same).

On auto-ranging and zero-bucket witdth

In any case, regardless of exponential vs. log-linear (which both have merits, and likely end up with similar footprints when exponential is restricted to whole-powers-of-two resolutions) the notions of auto-ranging (or not) and of protocol-stated-0-bucket-width (or not) are probably orthogonal (they can both be applied to log linear or to exponential). I found a need for a "lowestDiscernibleValue" in integer (values) histograms, which may seem logically equivalent to a 0 bucket width (it;'s not, it a "what does an integer unit mean" actually). But I found that in floating point (values) histogram the equivalent need disappeared when auto-ranging was introduced. In my implementations it disappears as long as you are willing to live with the limitation that a single histogram will not hold values across a dynamic range larger than e.g. 1:2^55 (for 1% precision expectations), but even that limitation can probably be removed if the underlying representation was not an integer-values histogram coupled with an integer-to-floating-point normalization factor. [the 1:2^55 range limitation at 1% precision is derived from the signed 64 bit range limitation on underlying integer values].

A way to reason about how auto-ranging eliminates the need for a zero bucket width is to consider "pure 0" as it's own special bucket (with no width). For all non-zero values within a covered dynamic range (with a value ratio within the range of no more than e.g. 2^55), the normal bucket scheme (log-linear or exponential) is used with what is presumably flat array of value counts. But 0 is special, and has it's own count (which can actually sit in the logical index 0 location in the array, as it is not used for anything else) . Auto-ranging is then a matter of shifting the logical position where the "0 index" of the array falls for all non-zero values. Auto-sizing (expanding the array range to fit a larger-than-previously-covered range) can involves moving values around in the array (in straight linear memcpy form). But since the number of times such moved might occur is limited (to the number of autosizing steps it gets to make it large enough to cover the range), these auto-sizings are not a real concern in anything other than strictly-consistent-very-low-latency-envrionments (and this can avoid auto sizing by ranging correctly from the start).

@giltene
Copy link

giltene commented Jul 15, 2021

On Jul 15, 2021, at 11:11 AM, Joshua MacDonald notifications@github.com wrote:

To help think about precision near zero, I wrote a small program to print the minimum value that can be resolved at scales from 8 to -4:
Scale 8 base 1.002711 min_value -2.775558e-17 [IEEE bits 0xbc80000000000000]
Scale 7 base 1.005430 min_value -1.387779e-17 [IEEE bits 0xbc70000000000000]
Scale 6 base 1.010889 min_value -6.938894e-18 [IEEE bits 0xbc60000000000000]
Scale 5 base 1.021897 min_value -3.469447e-18 [IEEE bits 0xbc50000000000000]
Scale 4 base 1.044274 min_value -1.734723e-18 [IEEE bits 0xbc40000000000000]
Scale 3 base 1.090508 min_value -8.673617e-19 [IEEE bits 0xbc30000000000000]
Scale 2 base 1.189207 min_value -4.336809e-19 [IEEE bits 0xbc20000000000000]
Scale 1 base 1.414214 min_value -2.168404e-19 [IEEE bits 0xbc10000000000000]
Scale 0 base 2.000000 min_value -1.084202e-19 [IEEE bits 0xbc00000000000000]
Scale -1 base 4 min_value -5.421011e-20 [IEEE bits 0xbbf0000000000000]
Scale -2 base 16 min_value -2.710505e-20 [IEEE bits 0xbbe0000000000000]
Scale -3 base 256 min_value -1.355253e-20 [IEEE bits 0xbbd0000000000000]
Scale -4 base 65536 min_value -6.776264e-21 [IEEE bits 0xbbc0000000000000]

While these minimum values do seem "small enough to not matter" when we think in terms and units we commonly see, I don't see a necessity to impose a minimum larger than 2^-1023.... Perhaps more importantly, avoiding this limitation probably comes hand-in-hand with avoided wasted space when the entire value range being recorded is far away from the "base" we might statically choose.

This os a key point about auto-ranging, which I think we should consider before nailing down the meaning of scale: In the proposed format, scale (which controls the ratio between consecutive buckets) and base (an arbitrary point in the floating point range that is assumed to fall within the covered dynamic range) are implicitly related. But there is no need for them to be, and de-coupling them has real benefit.

The scale will determine the number of buckets needed to cover a given range of numbers. Various practical considerations can constrain the maximum dynamic size of a range (dynamic range) in terms of the ratio between the smallest and largets numbers represented at the same time in the same data set. But where that range (constrained or not) falls within the [2^-1023 to 2^1023] range encode-able using double precision floating point numbers does not need be limited in any way.

The reason to avoid an implied base (which would dictate an implied number range) is that we don't actually know what the units of the things being put into the histograms are. E.g. some application may report response times in nanoseconds, or milliseconds, or seconds (or days, or years). and since histograms might be used for arbitrary stuff using arbitrary units (e.g. size in picowidgets or terawidgets, wavelength in angstroms or lightyears, acceleration in nanometers per second per second, or furlongs per fortnight per fortnight) there is great benefit for allowing the same dynamic range to be used regardless of unit choice.

This was the original reason I build auto-ranging for DoubleHistogram: A group building a monitoring system wanted high-dynamic-range (HDR) histograms that could be used to track the value distribution of data that they did not understand or control, and did not know the meaning or units of. That reason and motivation would seem to apply directly here (open telemetry) when consider what histograms can support.

Pre-determining the base [as e.g. 2^(2^-scale) in the example above] will limit the dynamic range. But auto-raging basically makes the base a function of the smallest experienced non-zero number in the data set. The base can then move to accommodate the data as it is recorded, as long as the range of values recorded in the data set does not exceed the supported dynamic range.

Lets discuss this in practical terms for an exponential representation (the log-linear representation can be seen in existing code): In an exponential encoding (with a scale defining the ratios between consecutive buckets as 2^(2^-scale)) that can use an arbitrary base, supporting a dynamic range that would cover the entire 2^-1023 to 2^1023 possible value range would require (2048 * 2^scale) indexes. so e.g. at a scale of 7, a counts array that can grow to 2048 * 128 = 262144 (256K) entires can cover the entire possible dynamic value range. Auto-sizing the counts array allows the space used to be limited to the amount needed for the actual value data being recorded, and doing auto-sizing in chucks (e.g. 128 at a time) coupled with limiting the array size (to e.g. 7,040 max entires, to support a dynamic range of up to 1 : 2^55) will contain the worst case number of resize operations, along with the memcpy costs involved in that constrained number of resize operations, to the point where the implementation becomes very practical and performant.

[Note that when a sparse (or packed) representation is used, the size of the array doesn't matter as much ( because it is a virtual array, and only non-zero-count entires need a physical manifestation] but it can still be good to limit the array size to contain worst case expansions on degrease data sets].

@jmacd
Copy link
Contributor Author

jmacd commented Jul 15, 2021

(There was a grave error in the output from the program quoted above, so I deleted the comment. Please disregard.)

@giltene Thank you for continuing to patiently explain. I had a couple of misunderstandings about HDRHistogram's representation. I think I now understand there to be a connection between the scale=0 base-2 exponential histogram of OTEP 148 and the HDRHistogram log-linear representation. The algorithm shared by @pirgeo and @oertl appears to show how to convert between the linear portion of the HDRHistogram bucket index and the correct scale=0 base-2 exponential bucket in O(1) steps.

You raised a point about the relationship between scale and base. Earlier in this (lengthy!) discussion, various participants have proposed we avoid loss of precision that results from changing the reference base between arbitrary values, which led us here.

@jmacd
Copy link
Contributor Author

jmacd commented Jul 15, 2021

@beorn7 It feels like there is a consensus around the protocol you shared, except for this point about the zero bucket resolution. How would you feel if the zero bucket width were an optional part of the protocol? Users could specify that number to indicate the "smallest discernible value" if they please. When merging histograms with different smallest discernible values, what do you recommend? I would suggest we output a merged histogram with the minimum of smallest discernible values, otherwise sum their buckets. I would not make any effort to maintain the maximum smallest discernible value by collapsing some of the non-zero buckets into the zero bucket.

@giltene
Copy link

giltene commented Jul 15, 2021

You raised a point about the relationship between scale and base. Earlier in this (lengthy!) discussion, various participants have proposed we avoid loss of precision that results from changing the reference base between arbitrary values, which led us here.

Can you elaborate eon the precision loss concern?

A simple way to avoid concerns around conversions or moves between arbitrary bases, and to address precision questions around e.g. merging data from two histograms that used two different arbitrary bases, is to normalize the base choices by e.g. only using bases that are powers of 2 (but to freely move between them as part of auto-ranging). If/when you do that, are there any remaining precision loss concerns?

@jmacd
Copy link
Contributor Author

jmacd commented Jul 16, 2021

@giltene

is to normalize the base choices by e.g. only using bases that are powers of 2

This is what we are trying to do by limiting the choice of base. The argument was that converting from base=2 to base=10 introduces statistical artifacts because one is not a factor of the other.

@giltene
Copy link

giltene commented Jul 16, 2021

@giltene

is to normalize the base choices by e.g. only using bases that are powers of 2

This is what we are trying to do by limiting the choice of base. The argument was that converting from base=2 to base=10 introduces statistical artifacts because one is not a factor of the other.

Right. So if the base can be an arbitrary power of 2, and you keep the base <= the lowest non-zero recorded value, you end with no need for a zero bucket width.

auto-ranging is then quite simple. You set the base to the power of 2 that is <= the first recorded value, and then move the base down on any “underflow” (a recorded value lower than the current base).

When the base is always a power of two, and the resolution is also limited to powers of two (which it is, since scale is an integer), the is a while number if buckets between any old and new bsse level when you shift bases, Changing the base (e.g. when a recorded value “underflows” the current base) is then very cheap, and there us no “bucket splitting” issue. If there was no resizing of the underlying array storage involved, there would be no memory to move around at all (just a shift if the logical 0 position in a “circular” array. When there is resizing (which is likely in any application that doesn’t want to pre-allocate a wide enough range, because why not auto size?), some memcpy shifting of array elements is needed (but no math is needed on individual elements as they are copied). Auto sizing can happen on either end: on an underflow (which will shift the base down and add enough buckets to cover the added value range), or on overflow (no change to base, just adding buckets to cover the range).

keeping the zero count “outside the array” by special-handling the actual 0 value can keep things simpler (no need to deal with its overlap with buckets). Or, as a alternative, you can “burn” (2^scale)-1 buckets (below the base) that would never get used and work the index computation math to make it naturally translate the 0 value to the logical 0 index in the array, while the non-zero values all end up in the right places (at or above the base), in order to keep the recording logic faster by reducing a conditional branch. (I do the latter, but either one would work).

@beorn7
Copy link

beorn7 commented Jul 16, 2021

How would you feel if the zero bucket width were an optional part of the protocol?

In fact, it already is optional, at least in our current PoC (with the disclaimer given above: it's just a PoC to experiment, no guarantees anything will survive into a released version). It's more by the nature of protobuf v3 that a missing zero-bucket width would be seen as a width of 0.

Users could specify that number to indicate the "smallest discernible value" if they please. When merging histograms with different smallest discernible values, what do you recommend? I would suggest we output a merged histogram with the minimum of smallest discernible values, otherwise sum their buckets. I would not make any effort to maintain the maximum smallest discernible value by collapsing some of the non-zero buckets into the zero bucket.

Again with the disclaimer that nothing like this has been implemented yet in Prometheus land, and it is all open to experimentation: I believe that merging histograms should result in the widest zero bucket among the source histograms determining the zero bucket width in the aggregated histogram, with all buckets for smaller values being merged into it. IIRC this is a crucial idea behind DDSketch (and possibly other histogram implementations with zero buckets of finite width).

@jmacd
Copy link
Contributor Author

jmacd commented Jul 16, 2021

If/when you do that, are there any remaining precision loss concerns?

No, that was it. The appeal of fixing a family of bases to repeated squares of each was influenced by UDDSketch.

It seems like the HDR histogram techniques that you developed all apply to an exponential histogram, but to literally use HDRHistogram we would all have to adopt the log-linear encoding. I was looking for help defining the pseudo-code for an auto-ranging exponential histogram here.

@yzhuge
Copy link

yzhuge commented Jul 16, 2021

I am working on a prototype for a "scaled base2 exponential histogram", as an implementation of https://github.com/open-telemetry/oteps/blob/main/text/0149-exponential-histogram.md. Basically it has two rules:

  • bucket_lower_bound = base ^ bucket_index
  • base = 2 ^ (2 ^ -scale)
    Because index can be negative, the bucket bound can be smaller than base. The max and non-zero-min value in an incoming dataset defines an index window. The prototype will auto downscale (go to current scale minus 1, to reduce index window size by a factor of 2) to fit data in a configured max number of buckets. I expect to publish the prototype sometime next week.

For simplicity, the prototype rounds subnormal numbers (5e-324 to 2e-308) to 0. So it is meaningful for OTel to state that the zero counter may include numbers rounded down to 0. But I lean against recording the rounding threshold, even as optional field in OTel protocol. First the question is how are you going to use it, if you have it in the protocol? Besides showing the threshold to the user in some particular UI, there is not much use. And it actually creates confusion. For example, if it is larger than the lowest bucket, what does that mean (overlapping bucket boundary?)? And since protobuf optional fields default to 0, when it is not set, it looks like the threshold is 0, but in fact it may not be (the producer neglected to set it, or the producer does not know what it is). Then we may need a value for "unknown" (use "NaN"?). Tracking the threshold opens a can of worms.

For now, I think we should just treat the threshold as part of floating point processing rounding error on the producer system. The OTel protocol does not attempt to report such error margin.

@jmacd
Copy link
Contributor Author

jmacd commented Jul 16, 2021

Sounds great, @yzhuge!

@yzhuge
Copy link

yzhuge commented Jul 23, 2021

"New Relic Sketch" (NrSketch) is now open source: https://github.com/newrelic-experimental/newrelic-sketch-java
In addition to implementing a scaled base2 exponential histogram, NrSketch has the following features:

  • Allow users to configure maximal number of histogram buckets and initial scale (ie. histogram resolution).
  • Auto down scale to fit dataset into the configured number of buckets
  • ComboNrSketch supports high resolution bucketing for both positive and negative numbers
  • ConcurrentNrSketch supports multi-thread concurrency
  • Tracks min, max, count, and sum of the dataset
  • Fast insert performance via lookup tables, bypassing floating point operations and Math.log() call
  • Small memory footprint. It uses the smallest integer type (1, 2, 4, or 8 byte per counter) that can hold the bucket
    counts.
  • Histogram merge code included
  • Percentile calculation code included

@jmacd
Copy link
Contributor Author

jmacd commented Jul 23, 2021

I feel this issue has been open long enough to gather all the feedback we are going to gather. Having read through and sat with this debate for a while, I've come to the conclusion that my original premise was incorrect. OpenTelemetry might be better off in some ways for having a single Histogram in its protocol, and the vendors and the collector code base would appreciate that, but the users and the community are not better off for us forcing a single kind of Histogram to be used.

There are more than one good kind of Histogram and we have definitely chosen one of them here. The base-2 exponential histogram is favored by many in this discussion for its conceptual and mathematical ease, compared with log-linear histograms. We have seen 3 prototypes of the base-2 exponential approach, written by @oertl, @beorn7, and @yzhuge.

On the other hand, there are two well-known log-linear histograms with open-source implementations that are already in use for telemetry applications. There does not appear to be a dominant Histogram, neither OpenHistogram nor HDRHistogram, and both are great at what they do and reasonable choices for use in a telemetry system.

I believe we should move forward with OTLP protocol support for base-2 exponential histograms, and I also think we should add support for a log-linear histogram protocol that can capture both OpenHistogram and HDRHistogram. I have opened a draft of the base-2 exponential approach here for review, copied directly from @beorn7's proposal.

I will close this issue and open new issues in the OTel-proto repository to capture work needed for log-linear histogram support, assuming others agree with this direction.

Speaking as a vendor (Lightstep), we are happy to accept any high-resolution histogram format included in OTLP. We see a lot of promise in the base-10 log-linear OpenHistogram approach, because standard units conversion (e.g., milli-seconds to seconds) and threshold queries (e.g., percent of latencies less than 100ms) are exact. We are not concerned with errors introduced by histogram conversion, because those errors are relatively easy to control. The real applications for this data that we have are not impacted by single-digit percent errors in individual buckets.

From an SDK perspective, I believe we should return to an earlier proposal, #982, which is to accept any histogram as a viable default for OpenTelemetry SDKs. This allows OTel SDKs to release Metrics SDK support using OpenHistogram, HDRHistogram, or one of the new base-2 exponential offerings that we expect based on the prototypes we've seen.

@jmacd
Copy link
Contributor Author

jmacd commented Jul 23, 2021

@giltene To be a little more concrete, what I was thinking about for log-linear protocol support goes back to @yzhuge's original proposal for a variable-base variable-sub-buckets log-linear histogram. OpenHistogram can be described with base=10 and sub_buckets=90 and perfect-subsetting variations with sub_buckets in {3, 9, 18}. HDRHistogram can be described with base=2 and sub_buckets in the powers-of-2. The basic protocol appears simple to capture.

Speaking as a vendor, we are happy to contribute code that may be used in the OpenTelemetry collector to convert between any of the supported OTLP histogram formats. We already have such code in production and I will be glad to open-source it.

@giltene
Copy link

giltene commented Jul 24, 2021

@jmacd putting my HDRhistogram hat aside, I had been converging on an exponential representation "making perfect sense" for the subject matter here, and mostly looking to apply lessons learned from other implementations that can help avoid adding restrictions early, especially when it comes to configuration assumptions. E.g. I think that:

  • auto-ranging and auto-scaling should be considered from the start (both can be optional in implementations, but should not be prohibited or conflicted with in the initial specification).
    • By this I mean that the size (number of buckets) and the range (lower bound and upper bound) should not be required by configuration, and that when those things are [optionally] set, their implications do not impose different bucket boundaries than those that an auto-ranging / auto-scaling implementation may need to use.
    • E.g. if an auto-sizing / auto ranging histogram reads in data that occupies a certain range and number of buckets, it does not have to restrict itself to that number of buckets or that range. [the existence once of range and size information in the representation does not imply configuration].
    • E.g. If an auto-ranged histogram covers recorded data that is contained in the range of 2^-292 to 2^-321, a bucket covering e.g. the value 2.0 does not have to be included in the represented data, either when serializing or deserializing.
  • the zero bucket-width concept (while a perfectly viable internal; concern for compatible implementations) is likely a non-necessary configuration item when auto-ranging is available
  • sparse-in-memory representations are likely in implementations

To the question of "which scale"? I actually think of exponential auto-ranging as "scale-less", since the "bucket_lower_bound" and the "base" items are actually redundant concepts. There is a lower bound and a higher bound to the the value range captured in the histogram, and both can be well below or well above some arbitrary fixed "base". The "base" it self has no real meaning, and the lower bound can be used as the "base". Or conversely that "base" can be used as the lower bound, but [when auto ranging is enabled or used] can mutate as values are recorded.

For consistency (i.e. no loss of precision when merging histograms under the same representation and resolution), we need bucket boundaries to align across histograms. All that is needed for this is some agreed upon limitations/quantizations on available resolutions and lower bound boundaries. And there I tend towards using integer powers of 2. I.e. accepting the limitation that resolution is stated only with integer powers-of-2 levels, and that bucket lower bounds must be a integer powers of two (e.g. 2^-103 is a valid lower bound, and so is 2^79).

So I would call the scheme "exponential powers-of-2-aligned", rather than "exponential with base=2".

Keeping configuration to a bare minimum is a good and already-stated goal. The only (unavoidable?) configuration item I see as a common denominator to all "hi res" histograms is the need to specify required precision (spelled "resolution" and implemented by specifying "scale" in the discussed proposal, and spelled otherwise in the various other implementations).

  • IMO a precision good enough to guarantee +/-0.5% relative error (such that any two values that are >1% away from each other are distinguishable) should be representable, and may be the common case preferred by users. But that just means that we want the supported resolution range to cover that choice.

When it comes to interacting with various existing histogram implementations (including various log linear, exponentials with bases or resolutions that do not align with this one, etc. etc.) we must accept some loss of precision during merging or importing data from such histograms. While individual histogram implementations can probably preserve values with no [additional] loss of precision when copying or merging histograms within the same implementation, some precision loss across unaligned implementations will be inherent due to the bucket boundaries differing. This is likely not a major concern (e.g. merging two non-same--implementation histograms which both provide +/- 0.5% relative error guarantees will have the expected behavior of increasing the relative error to "up to +/- 1%" due to the merge).

To be clear, HdrHistogram is not "log-linear" to its users. It just [currently] uses an underlying log-linear implementation. HdrHistogram is HDR (High Dynamic Range) and sets precision and ranging behavior expectations through its APIs. I can easily envision evolving HdrHistogram's internal implementations and representations to some future v3.0 that would shift from log-linear to "exponential powers-of-2-aligned" with no change to any of the exposed APIs or specified semantics. Such a future library version might support importing serialized data from the old V2.0 or v1.0 formats, by either supporting the various versions of underlying formats, or by setting an expectation that importing e.g. v2.0 formatted data by a v3.0 library may involve round-off value precision losses.

In fact, if this proposal ends up solidifying on an only-resolution-is-required-in-configuration and only-resolution-is-dictated-by-wire-form, and settles on the only limiting assumptions being that both resolution and bucket boundaries are stated as be integer powers of 2 and that resolution is limited to reasonable range (e.g. allowed scale values in the specification cover the range of 0...17), I think that such a v3.0 HdrHistogram evolution is quite likely, which would allow converting between opentelementy histograms and HdrHistogram to be done seamlessly and with no loss of precision, and allow users of HdrHistogram's useful features (like the double buffer'ed Recorder pattern, linear/log/percentile iterators, speed and concurrency) to easily integrate with opentelementry. Supporting the opentelemetry wire form for histograms will also be likely then (it would be the first ever dependency added in HdrHistogram). It would also provide a widely polygot base for open telemetry histogram producers and consumers (since HdrHistogram implementations currently exist for Java, Javascript, Python, .NET, Go, Rust, Erlang, and C/C++, and we can probably motivate them all to evolve to the new formats).

It would be much harder to envision such a evolution if the new format was not compatible with auto-ranging and auto-scaling, or required the representation of a zero bucket width concept, or did not allow for histograms with e.g. 3 decimal points of precision (+/-0.05% error value, scale = 10).

@jmacd
Copy link
Contributor Author

jmacd commented Jul 27, 2021

If I understand correctly, @giltene, you would like to see one new field that scales the values in a histogram? As far as I understand it, the HDR Histogram format uses a "Integer to double conversion ratio" as this field.

I can easily envision evolving HdrHistogram's internal implementations and representations to some future v3.0 that would shift from log-linear to "exponential powers-of-2-aligned" with no change to any of the exposed APIs or specified semantics.

Cool! 😀 🎉

All -- I feel it is time to close this issue. Please direct attention to open-telemetry/opentelemetry-proto#322 for the current protocol review and file issues in this repository with suggestions and/or proposed improvements. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
spec:metrics Related to the specification/metrics directory spec:protocol Related to the specification/protocol directory
Projects
None yet
Development

No branches or pull requests

9 participants