Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HistogramExemplarReservoir and performance impact #3953

Open
cijothomas opened this issue Mar 21, 2024 · 5 comments
Open

HistogramExemplarReservoir and performance impact #3953

cijothomas opened this issue Mar 21, 2024 · 5 comments
Labels
spec:metrics Related to the specification/metrics directory triage:deciding:community-feedback Open to community discussion. If the community can provide sufficient reasoning, it may be accepted triage:deciding:needs-info Not enough information. Left open to provide the author with time to add more details

Comments

@cijothomas
Copy link
Member

https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/metrics/sdk.md#alignedhistogrambucketexemplarreservoir has the wording "This implementation MUST keep the last seen measurement that falls within a histogram bucket". Consider the example where Delta temporarily is used.
What is the reasoning behind preferring the "last-seen" measurement within a histogram bucket? In other words, would "first-seen" be as good? (I am only talking about Delta temporality).

From a performance standpoint, "last-seen" causes additional work, without any additional value. Specifically, "last-seen" means the Reservoir must create a new Exemplar and overwrite it in the existing Exemplar slot. Overwriting itself might be okay, but creating a new Exemplar requires memory allocation for value,traceid,spanid,timestamp/dropped-attributes. It is entirely possible that value,traceid,spanid already exists in memory, so less of a concern, but timestamp has to be generated (system call + memory allocation most likely), and possibly dropped-attributes.
Also, "first-seen", allows the possibility of further perf optimizations in sdk implementations. For example, if an Exemplar is already recorded in an interval, the SDK can short-circuit a lot, knowing the slot is already filled in, and no longer have to worry about Exemplar until next collection.

Proposal: Could we change the algorithm to use "first-seen" for Delta Aggregation, and keep current (last-seen) for Cumulative.

Alternate: If timestamp is the only perf concern, could we relax the wording on the time. It currently says "The time the API call was made to record a Measurement", but could we relax it, so that implementations can pick the endtime instead, and avoid the cost of obtaining time in hot path?

Related discussion (Cumulative focused): #3891

Because of #3943, I don't think this should block #3870, but I'd love to see if we can change defaults before declaring stable.!

@cijothomas cijothomas added the spec:metrics Related to the specification/metrics directory label Mar 21, 2024
@trask trask added triage:deciding:community-feedback Open to community discussion. If the community can provide sufficient reasoning, it may be accepted triage:deciding:tc-inbox Needs attention from the TC in order to move forward labels Mar 26, 2024
@jsuereth
Copy link
Contributor

Regarding "last-seen" this was for compatibility with existing prometheus histogram behavior (as is the reservoir-matches-bucketing). We did a lot of experimentation around exemplars and I actually think we MAY be able to alter the default to use the FixedSizeReservoir, as we get decent performance/coverage out of that. We expect to spend some time on new algorithms.

In this investigation, though, we saw pretty good performance from FixedSizedReservoir. It has less guarantees about capturing "edges" but given its nature, may get better coverage of split-means, e.g.

Regarding performance - in Java we pre-allocate cells for the reservoir, so no allocations occur in the hot path. Is this ugly in a GC language, yes. Does it matter for performance? yes. See: https://github.com/open-telemetry/opentelemetry-java/blob/main/sdk/metrics/src/main/java/io/opentelemetry/sdk/metrics/internal/exemplar/ReservoirCell.java.

Alternate: If timestamp is the only perf concern, could we relax the wording on the time. It currently says "The time the API call was made to record a Measurement", but could we relax it, so that implementations can pick the endtime instead, and avoid the cost of obtaining time in hot path?

FYI - in Java we're using a special clock designed for trace that only hits one of the system clocks to diff duration from start-of-span. It helps dramatically here.

Unfortunately, I don't think we should relax this requirement, it's one of the key values of an exemplar vs. a metric in that it can help you time-align other signals with an incident.

@jsuereth
Copy link
Contributor

Regarding the suggestions -

  • I think a "First seen" exemplar reservoir would be a welcome addition as a built-in thing. Whether or not it's the default for DELTA histograms, I'll have to think about, and would welcome user feedback. This would NOT violate the "look like prometheus" behavior, since they do not have delta histograms.
  • I do think at this point the FixedSizeReservoir provides a good experience and could be used as the default for otel.

@cijothomas
Copy link
Member Author

I think a "First seen" exemplar reservoir would be a welcome addition as a built-in thing. Whether or not it's the default for DELTA histograms, I'll have to think about, and would welcome user feedback. This would NOT violate the "look like prometheus" behavior, since they do not have delta histograms.

got it! Are you looking for feedback from end-users or from the people who implement Exemplars in OTel sdks? Most of end-users would not know (or care) if the SDK defaults pick first-seen vs last-seen in delta right? If that is the case, the only feedback we need is from java/go/dotnet/other exemplar implementors?

@jack-berg
Copy link
Member

@cijothomas This issue seems to be asking for a new exemplar reservoir specific to histograms, and to consider using that new reservoir as the default for histograms. This shouldn't be a blocker to exemplar stabilization because:

  1. It can be added as a new spec'd reservoir later.
  2. We can change the default reservoir as a non-breaking change.

Can you either change the title of this request / framing of this issue, or open a new issue? Thanks.

@tigrannajaryan tigrannajaryan added [label deprecated] triaged-needmoreinfo [label deprecated] The issue is triaged - the OTel community needs more information to decide triage:deciding:needs-info Not enough information. Left open to provide the author with time to add more details and removed [label deprecated] triaged-needmoreinfo [label deprecated] The issue is triaged - the OTel community needs more information to decide labels Apr 19, 2024
@jack-berg jack-berg removed the triage:deciding:tc-inbox Needs attention from the TC in order to move forward label Apr 24, 2024
@svrnm
Copy link
Member

svrnm commented Aug 5, 2024

@cijothomas please take a look at @jack-berg's comment, can you provide the required details? Thank you!

@github-actions github-actions bot added the triage:followup Needs follow up during triage label Sep 19, 2024
@trask trask removed the triage:followup Needs follow up during triage label Sep 24, 2024
@github-actions github-actions bot added the triage:followup Needs follow up during triage label Sep 25, 2024
@trask trask removed the triage:followup Needs follow up during triage label Sep 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
spec:metrics Related to the specification/metrics directory triage:deciding:community-feedback Open to community discussion. If the community can provide sufficient reasoning, it may be accepted triage:deciding:needs-info Not enough information. Left open to provide the author with time to add more details
Projects
None yet
Development

No branches or pull requests

6 participants