-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
discuss: stateful series tracking staleness #31016
Comments
Thanks for opening this @sh0rez. Would you be willing to propose a corresponding config as well? I assume the user should have control over the staleness timeout. Is there anything else? A related concern I have is regarding how the user can manage cardinality. Should we also have the ability to set a max number of streams, and flush the oldest when we would exceed the max? I'm asking here because these are both directly related to managing the amount of data retention, so we might want to unify these concerns in a single package. I'm curious your thoughts on this. |
I think we can lean on Prometheus experience for this, like this talk: https://promcon.io/2017-munich/slides/staleness-in-prometheus-2-0.pdf tldr:
This of course heavily builds on Prometheus data model assumptions, which are different from OTel.
most importantly, do we even want that? e.g. a sporadic delta producer might be stale all the time. what are the use-cases we need to enable? prom-like monitoring + alerting? low-connectivity iot? |
Thanks for the detailed thoughts on this @sh0rez. At a high level, I like the idea of not reinventing the wheel but I don't have clear answers to your questions so would want to hear other people's thoughts as well. Perhaps some folks with more OTel && Prometheus experience can chime in. |
Apologies for the delayed response; I had some family health issues last week. IMO, I think a fixed interval gets us 98% of the benefits and is very simple to implement / understand. While it does have the "overlap" issue, IMO, this isn't really a big issue. The "old" counter will no longer be modified, so while there is "overlap", any useful operations, like |
@sh0rez @djaglowski I created a WIP PR implementing the above behaviour: ce07908 |
… staleness (#31089) **Description:** It's a glorified wrapper over a Map type, which allows values to be expired based on a pre-supplied interval. **Link to tracking Issue:** #31016 **Testing:** I added some basic tests of the PriorityQueue implementation as well as the expiry behaviour of Staleness **Documentation:** All the new structs are documented
) **Description:** Removes stale series from tracking (and thus frees their memory) using staleness logic from open-telemetry#31089 **Link to tracking Issue:** open-telemetry#30705, open-telemetry#31016 **Testing:** `TestExpiry` **Documentation:** README updated
… staleness (open-telemetry#31089) **Description:** It's a glorified wrapper over a Map type, which allows values to be expired based on a pre-supplied interval. **Link to tracking Issue:** open-telemetry#31016 **Testing:** I added some basic tests of the PriorityQueue implementation as well as the expiry behaviour of Staleness **Documentation:** All the new structs are documented
) **Description:** Removes stale series from tracking (and thus frees their memory) using staleness logic from open-telemetry#31089 **Link to tracking Issue:** open-telemetry#30705, open-telemetry#31016 **Testing:** `TestExpiry` **Documentation:** README updated
implementation and re-usable components are merged, closing |
Component(s)
deltatocumulative
(wip),interval
(wip), others?Describe the issue you're reporting
Stateful components keep state about telemetry signals (like metric streams) in memory.
WIP processors like
deltatocumulative
andinterval
need to maintain (variable size) set of samples per tracked series.As series may come and go, tracking those indefinitely directly causes unbound memory growth.
Systems like Prometheus solve this using "staleness", meaning that series not receiving fresh samples for a given time interval are considered "stale" and subsequently removed from tracking, thus freeing the memory occupied.
Given the functional overlap of several stateful metrics processors needing to track streams and expire that tracking, I think there is an opportunity to generalize this behavior, e.g. using a stream-map interface like follows:
A staleness implementation may look as following:
/cc @RichieSams @djaglowski
The text was updated successfully, but these errors were encountered: