Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bound memory use of the cumulative sum aggregator #3006

Open
2 tasks
MrAlias opened this issue Jul 11, 2022 · 4 comments
Open
2 tasks

Bound memory use of the cumulative sum aggregator #3006

MrAlias opened this issue Jul 11, 2022 · 4 comments
Labels
area:metrics Part of OpenTelemetry Metrics blocked:specification Waiting on clarification of the OpenTelemetry specification before progress can be made pkg:SDK Related to an SDK package

Comments

@MrAlias
Copy link
Contributor

MrAlias commented Jul 11, 2022

The cumulative sum Aggregator will never forget a sum for an attribute set, even if that set if never updated again.

This means that an unbounded amount of memory will be used by the SDK if there are unbounded number of attribute sets being used.

Ultimately this is a user problem, they need to not allow unbounded many attribute sets to be recorded. However, the SDK should attempt to optimize its performance and remove "state" sums it has.

TODO

  • Determine what "stale" means. This will require looking at other implementation and might need a specification change/addition to ensure uniformity across OTel.
  • Implement an algorithm for the cumulative sum Aggregator that forgets stale sums.
@MrAlias MrAlias added pkg:SDK Related to an SDK package area:metrics Part of OpenTelemetry Metrics labels Jul 11, 2022
@MrAlias MrAlias added this to the Metric SDK: Beta milestone Jul 11, 2022
@jmacd
Copy link
Contributor

jmacd commented Jul 18, 2022

I think that OpenTelemetry may want to specify a behavior for SDKs to agree on and implement. The data model features "no data present" flags for each data point, which can be used to signal the intentional end of a series--it would be appropriate to emit one of these no-data-present points for "a while" before forgetting the point entirely. The problem or difficulty is in determining what the new start time should be when the series is re-started. Not sure there's a "best" answer to this question.

@MrAlias
Copy link
Contributor Author

MrAlias commented Jul 19, 2022

I think that OpenTelemetry may want to specify a behavior for SDKs to agree on and implement.

Agreed. I also think in that agreement it need to specify what stale means. If one implementation defines stale as 5 cycles and another 2000 user behavior will be different.

Is this something you can work on in the specification @jmacd?

@jmacd
Copy link
Contributor

jmacd commented Jul 25, 2022

I am interested in helping with this specification, but I would like to look to the Prometheus developers for guidance.
Let's move to open-telemetry/opentelemetry-specification#1891.

@jmacd
Copy link
Contributor

jmacd commented Jul 25, 2022

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:metrics Part of OpenTelemetry Metrics blocked:specification Waiting on clarification of the OpenTelemetry specification before progress can be made pkg:SDK Related to an SDK package
Projects
Development

No branches or pull requests

2 participants