HistogramExemplarReservoir and performance impact #3953

cijothomas · 2024-03-21T16:54:30Z

https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/metrics/sdk.md#alignedhistogrambucketexemplarreservoir has the wording "This implementation MUST keep the last seen measurement that falls within a histogram bucket". Consider the example where Delta temporarily is used.
What is the reasoning behind preferring the "last-seen" measurement within a histogram bucket? In other words, would "first-seen" be as good? (I am only talking about Delta temporality).

From a performance standpoint, "last-seen" causes additional work, without any additional value. Specifically, "last-seen" means the Reservoir must create a new Exemplar and overwrite it in the existing Exemplar slot. Overwriting itself might be okay, but creating a new Exemplar requires memory allocation for value,traceid,spanid,timestamp/dropped-attributes. It is entirely possible that value,traceid,spanid already exists in memory, so less of a concern, but timestamp has to be generated (system call + memory allocation most likely), and possibly dropped-attributes.
Also, "first-seen", allows the possibility of further perf optimizations in sdk implementations. For example, if an Exemplar is already recorded in an interval, the SDK can short-circuit a lot, knowing the slot is already filled in, and no longer have to worry about Exemplar until next collection.

Proposal: Could we change the algorithm to use "first-seen" for Delta Aggregation, and keep current (last-seen) for Cumulative.

Alternate: If timestamp is the only perf concern, could we relax the wording on the time. It currently says "The time the API call was made to record a Measurement", but could we relax it, so that implementations can pick the endtime instead, and avoid the cost of obtaining time in hot path?

Related discussion (Cumulative focused): #3891

Because of #3943, I don't think this should block #3870, but I'd love to see if we can change defaults before declaring stable.!

The text was updated successfully, but these errors were encountered:

jsuereth · 2024-03-27T18:38:25Z

Regarding "last-seen" this was for compatibility with existing prometheus histogram behavior (as is the reservoir-matches-bucketing). We did a lot of experimentation around exemplars and I actually think we MAY be able to alter the default to use the FixedSizeReservoir, as we get decent performance/coverage out of that. We expect to spend some time on new algorithms.

In this investigation, though, we saw pretty good performance from FixedSizedReservoir. It has less guarantees about capturing "edges" but given its nature, may get better coverage of split-means, e.g.

Regarding performance - in Java we pre-allocate cells for the reservoir, so no allocations occur in the hot path. Is this ugly in a GC language, yes. Does it matter for performance? yes. See: https://github.com/open-telemetry/opentelemetry-java/blob/main/sdk/metrics/src/main/java/io/opentelemetry/sdk/metrics/internal/exemplar/ReservoirCell.java.

Alternate: If timestamp is the only perf concern, could we relax the wording on the time. It currently says "The time the API call was made to record a Measurement", but could we relax it, so that implementations can pick the endtime instead, and avoid the cost of obtaining time in hot path?

FYI - in Java we're using a special clock designed for trace that only hits one of the system clocks to diff duration from start-of-span. It helps dramatically here.

Unfortunately, I don't think we should relax this requirement, it's one of the key values of an exemplar vs. a metric in that it can help you time-align other signals with an incident.

jsuereth · 2024-03-27T18:42:56Z

Regarding the suggestions -

I think a "First seen" exemplar reservoir would be a welcome addition as a built-in thing. Whether or not it's the default for DELTA histograms, I'll have to think about, and would welcome user feedback. This would NOT violate the "look like prometheus" behavior, since they do not have delta histograms.
I do think at this point the FixedSizeReservoir provides a good experience and could be used as the default for otel.

cijothomas · 2024-03-28T01:29:35Z

I think a "First seen" exemplar reservoir would be a welcome addition as a built-in thing. Whether or not it's the default for DELTA histograms, I'll have to think about, and would welcome user feedback. This would NOT violate the "look like prometheus" behavior, since they do not have delta histograms.

got it! Are you looking for feedback from end-users or from the people who implement Exemplars in OTel sdks? Most of end-users would not know (or care) if the SDK defaults pick first-seen vs last-seen in delta right? If that is the case, the only feedback we need is from java/go/dotnet/other exemplar implementors?

jack-berg · 2024-04-10T16:17:14Z

@cijothomas This issue seems to be asking for a new exemplar reservoir specific to histograms, and to consider using that new reservoir as the default for histograms. This shouldn't be a blocker to exemplar stabilization because:

It can be added as a new spec'd reservoir later.
We can change the default reservoir as a non-breaking change.

Can you either change the title of this request / framing of this issue, or open a new issue? Thanks.

svrnm · 2024-08-05T09:39:10Z

@cijothomas please take a look at @jack-berg's comment, can you provide the required details? Thank you!

cijothomas added the spec:metrics Related to the specification/metrics directory label Mar 21, 2024

github-actions bot assigned jsuereth Mar 21, 2024

cijothomas mentioned this issue Mar 21, 2024

Mark exemplars as stable. #3870

Merged

5 tasks

trask added triage:deciding:community-feedback Open to community discussion. If the community can provide sufficient reasoning, it may be accepted triage:deciding:tc-inbox Needs attention from the TC in order to move forward labels Mar 26, 2024

trask unassigned jsuereth Mar 26, 2024

jack-berg removed the triage:deciding:tc-inbox Needs attention from the TC in order to move forward label Apr 24, 2024

github-actions bot added the triage:followup Needs follow up during triage label Sep 19, 2024

trask removed the triage:followup Needs follow up during triage label Sep 24, 2024

github-actions bot added the triage:followup Needs follow up during triage label Sep 25, 2024

trask removed the triage:followup Needs follow up during triage label Sep 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HistogramExemplarReservoir and performance impact #3953

HistogramExemplarReservoir and performance impact #3953

cijothomas commented Mar 21, 2024

jsuereth commented Mar 27, 2024

jsuereth commented Mar 27, 2024

cijothomas commented Mar 28, 2024

jack-berg commented Apr 10, 2024

svrnm commented Aug 5, 2024

HistogramExemplarReservoir and performance impact #3953

HistogramExemplarReservoir and performance impact #3953

Comments

cijothomas commented Mar 21, 2024

jsuereth commented Mar 27, 2024

jsuereth commented Mar 27, 2024

cijothomas commented Mar 28, 2024

jack-berg commented Apr 10, 2024

svrnm commented Aug 5, 2024