-
Notifications
You must be signed in to change notification settings - Fork 888
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prometheus and OTel High Resolution Histograms incompatibilities #2611
Comments
During the review of #2633, I discovered both a mistake in exposition for scales <= 0 and was surprised to learn something new about Prometheus' plans. The Prometheus (proposed) mapping function is off-by-one relative to what is contained in #2633. This is not an incompatibility, it just means that to convert between Prometheus and OpenTelemetry buckets one has to either add or subtract one. On the surface, this feels like an unnecessary defect. @beorn7 made it sound like artificial complexity, even:
This just illustrates how few people we have that are actually following this debate-- I will state here that @beorn7 is confounding the issues. The implied change in this statement would be a breaking change relative to OTLP v0.11 (unlike #2633), but it bears analysis that (unfortunately) the Prometheus team has not done. When we look at scale values > 0, where we are able to approximate the mapping function using a built-in logarithm function, these approaches look more-or-less natural. There are three variations:
At this point, if we had to choose what appears more or less natural, both OTLP v0.11 and Prometheus (proposed) look equally natural. PR #2633 looks unnatural, by some definition, because of the -1, which (although it is a breaking change) would appear to be an unnecessary complication relative to either alternative. Now comparing these three approaches for scales <= 0 we will see differences that make the OTLP v0.11 look most "natural", as follows. In OTLP v.11, for scale <= 0, the formula is simplest because IEEE floating point has a built-in (exactly lower-inclusive) base2 logarithm function.
The
In PR #2633, to change from lower-inclusive to upper-inclusive for power-of-two values (not breaking otherwise), we need a correction for exact powers of two. The correction is based on the significand:
This assumes
For the Prometheus (proposed), the result is similar to that of PR 2633 with an additional complication. We cannot directly use the builtin shift-right operator to compute the index as we have above, it will not round correctly. We have to add one after the shift: One approach:
Note that the right-shift operation used above is applied "off-by-one", that is we have an expression
Hopefully, this demonstrates why the OpenTelemetry project still considers the OTLP v0.11 design to be the simpler option, because it allows direct use of the hardware base-2 exponent and right-shift instruction for downscaling. The compromise in #2633 is an acceptable middle-ground where we add special cases for exact powers-of-two but preserve the ability to use right-shift for downscaling histogram buckets. |
I know this has likely been discussed, but I think it's important we push on some of these technical discussions from a requirements aspect and tradeoffs. I'm going to list the MAJOR requirements that OpenTelemetry had when designing exponential histograms (and you can see the things @jmacd fighting for reflecting these requirements). OTel Exponential Histogram RequirementsThis is a post-design list of all the considerations that went into OTel's exponential histogram. SOME of these requirements exist because it was proven that we have algorithms capable of meeting these requirements. Basic Behavior
Convenience
Efficiency
Accuracy
Issues with Prometheus plans for Sparse HistogramsPrometheus would like to preserve backwards compatibility with existing histogram semantics. This means having metric samples with a The proposal here outlines a path forward for OTEL + Prometheus that adds this new requirement in place and attempts to resolve issues. However, I still see the following unresolved, unaddressed in the proposal:
TL;DR;I think the crux of the argument between Prometheus / OpenTelemetry revolves around which requirement is more important:
From my take: Almost every user I've seen of monitoring/metrics would prefer lower-overhead instrumentation, as cost is one of the most significant concerns when adding instrumentation to code. This is one of those decisions that would be hard to tune the instrumentation for later, so I'd like us to really understand performance impact of solutions being proposed and trade off these two priorities from an end-to-end user perspective. |
Yes, from a consumer perspective I am more concerned about the resources (e.g. memory) that are needed when emitting metrics or when consuming them by Prometheus. Regarding backward compatibility I personally think this might cause more confusing than it wins. Currently in Prometheus you need to define the bucket sizes. The exponential histogram main feature is that you don't need to do this but more you need to define a level of precision you want. I think these two are a bit in conflict. I think it would be totally reasonable to introduce a new metric type in Prometheus |
Hi sorry for the delay in answering, busy week. I will do a benchmark early next week and see what changes need to be done and how much more memory / cpu the change in #2633 takes specially when rescaling. I do not expect there to be more resource or cpu usage but hopefully the benchmarks make that clear.
The backward compatibility is to allow applications which are instrumented with the newer sparse histograms to be scraped by older versions of Prometheus. We also believe that we can make the |
I do not believe that we should use "benchmarking" as the yardstick to evaluate these proposals. The number of CPU cycles spent on additional work here is truly negligible. The big concern being raised here--to me--is about code complexity. Of the three forms that were outlined in my post above, each of them has the same relative accuracy, but they are successively more difficult to implement. The time spent in update is small relative to the total cost of reporting the metric. What I don't want to see is a long comment explaining why you have to perform an operation like |
@gouthamve Does prometheus allow flexible bucket boundaries per-metric-stream already? Can an old prometheus pull in an exponential histogram if bucket boundaries change during its lifecycle? Also, thanks for the response! |
I have back-ported Lightstep's reviewed, tested, and running-in-production exponential histogram data structure to the OTel-Go repository here: open-telemetry/opentelemetry-go#3022 |
Gotcha. Do you think #2633 is decent compromise? I do realise that the Prometheus (proposed) caused more confusion than intended and if its too complicated, it can be ignored. The Prometheus community is happy with #2633.
Yes, as long as the bucket boundaries don't change frequently. It'll create more timeseries, and the query will likely break when dealing with the time-range where the change happens, but everywhere else it'll work. |
I think we likely want to see how much this is broken in practice. I have concerns around this where I feel like spikes in latency are LIKELY to break timeseries when they happen the first time per-process. Can we get a good demo of exponential-on-existing promtheus so we can evaluate tradeoffs? |
Follow up discussion from Specification SiG:
We still need to make progress on both the technical issues around:
|
Yes, this is completely possible and one of the core ideas.
This is also okay. |
Please consider approving #2633 @open-telemetry/specs-metrics-approvers |
In the recent SIG Spec call on 31st May, we discussed the incompatibilities in spec of Prometheus Sparse High Resolution Histograms and OTel Exponential Histograms.
I've documented the differences and how things work in this doc here: https://docs.google.com/document/d/11z8hc_T4RHPHaXuWiETxGa81Ix89dBV5jh9byh9ySgE/edit
Please add your suggestions and comments to the doc and we will be meeting on June 22 (8am-9am PT) in the Prometheus WG meeting to discuss this. Please feel free to join that meeting if interested.
The text was updated successfully, but these errors were encountered: