-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Delta to Cumulative Processor (metrics) #29300
Comments
I'm sponsoring this component. A few initial thoughts/questions on the design:
I think we could also support nonmonotonic delta to nonmonotonic cumulative without any substantial changes. Or am I missing something?
This seems likely to be problematic, since every data point in a time series would be inaccurate in between the two processors. e.g. delta timeseries with values 3, 2, 1 becomes cumulative time series 3, 2, 1 temporarily and then properly 3, 5, 6 only after the second processor. One note on the broader use case for this - the metrics data model specifically describes this functionality and rationale: Since our data model is explicitly designed for this use case, I think it is important that the Collector provides a solution for it, with sensible limitations. |
The intention would be that the output of the first processor would be correct, but there would be a one to one correspondence of datapoints. So if we had 1000 deltas/sec, we would transform to 1000 cumulatives/sec. The second processor would kind of of just emit the latest value every interval. So we would just get 1 datapoint per interval (e.g. every 30 seconds) So, delta timeseries with values 3, 2, 1 becomes cumulative time series 3, 5, 6, but the second processor would just output 6 (but then output 6 again at the next publish interval, and keep doing that forever, or until a new datapoint is received, or the timeseres become "stale", and is stopped being tracked) We can achieve something like the second processor already, by publishing to the Prometheus exporter, and scraping ourselves with a Prometheus receiver. But it's a bit ugly, no? |
Thanks for clarifying, I get the idea now. I actually find this composable design quite attractive. The implementation of each would be simpler and users may find a need for only one or the other (e.g. only use the first in combination with the prometheus exporter, or only use the second for reducing frequency of counter updates). |
Yes, me too :D |
This will ensure we design a reusable state-tracking mechanism, which I think is needed for many similar use cases. For example, metric aggregate over dimensions. |
@0x006EA1E5, given that we agree on splitting this into two components, do you mind opening two new issues in favor of this one? I will sponsor both. |
I'm going to remove the |
Do we really need two new issues, rather than just scoping the current one to delta to cumulative, and one new one which would be whatever we call the other component (the one to periodically output updates)? |
I created #29461 |
@0x006EA1E5, I apologize for the slow/missed responses on this. I still think we need these components and with the new year can be much more responsive, if you wish to renew your efforts on them. |
Yes, no worries, hopefully I will get some more time to look at this too 😁 |
I have some questions regarding the spec for these processors, sorry if this is documented already somewhere. We have the concept of identifying properties, which as I understand it includes The Metrics data model docs mention
What should we do here?
Anyone have any ideas regarding other things that need to be defined? |
Great questions. Focusing on the timestamp edge cases first.
I think the general principle is that we have two behaviors:
Does this make sense or am I oversimplifying it? |
It does make sense, but I'm a bit concerned with how the complexity can explode considering how many things can vary 🤔 And how do we make these behaviours configurable, without it becoming a mess. One important constraint will be how will the typical downstream consumers behave, for example if we expand the time window's start time, especially the |
This is something that is on our radar, and would like to support as much as possible. Our use case is to enable remote writing of metrics from the count connector to Thanos. |
This certainly could prove to be tricky but I think it may be worth trying in a simple form, and then iterating based on feedback. |
Hi @djaglowski and @0x006EA1E5 I'd like to help contribute my time to the implementation of this issue and/or #29461 I can be available to work semi-full time on it. @0x006EA1E5 I see above you're working through many of the edge cases. Do you have a fork already? Would you be willing to work together? |
Thanks very much @RichieSams. I think we'll gladly take the help unless @0x006EA1E5 is already working on it or just about to start. I haven't been working on any code for it, just trying to help design it ahead of time. I think it's fine to start development and we'll work through it as necessary. We have a contributing guide which articulates a strategy for splitting a new component into multiple PRs. This helps keep the complexity in each PR at a reasonable level so we can review them. |
A duplicate issue was opened for this component and appears to have started development. I think we should consolidate to one issue but we need to reestablish whether we are splitting the component. Currently waiting for feedback from those involved with the other proposal. |
It seems #30479 moved is very close to an exact duplicate of this issue and specifically does not include the functionality proposed in #29461. It has a detailed design doc and a reference implementation already too. Therefore, I will close this issue and suggest that anyone interested in doing so take a closer look at #30479 and the associated PRs. @RichieSams, @0x006EA1E5, or anyone else interested in #29461, I think we can parallel efforts by moving focus to that processor. |
The purpose and use-cases of the new component
Convert metric data from monotonic delta to monotonic cumulative.
We can currently convert from cumulative to delta, but not delta to cumulative.
One concrete use case is metrics produced by the count connector or deltas (in fact, a simple stream of monotonic delta 1s, with no period start_time_unix_nano; this design decision appears to have been made so that the count connector can be stateless)
Metrics produced by the count connector cannot be exported correctly via the Prometheus exporter due to the missing start_time_unix_nano, the Prometheus exporter appears to consider the data points to represent breaks in the sequence.
Metrics produced by the count connector are also not suitable for export with the Prometheus remote write exporter. Instead we should be sending the Prometheus remote write exporter periodic aggregates, as we would if the metrics had been scraped, for example every 30 seconds.
Ideally, we should be able to receive the stream of delta data points from the count connector (with missing start_time_unix_nano), and periodically emit a cumulative datapoint, suitable for reception by the Prometheus remote write exporter.
Note, due to the monotonic nature of the metrics, users should be aware that in a load balanced configuration, different instances will maintain different cumulative totals, which will then likely send incorrect, non-monotonic data downstream, unless care is taken to ensure that only one instance is ever responsible for a unique metric (for example, adding an identifying attribute such as the collectors unique instance id).
This processor will need to be stateful to maintain the cumulative total by metric.
The use case outlined above could be addressed in a single processor, which maintains the cumulative sum and periodically emits this value.Alternatively, this could be implemented in two processors, a simple delta to cumulative which emits a cumulative sum for each delta datapoint, and a periodic aggregator which could also be used for deltasThis issues proposes the creation of a simple delta to cumulative which emits a cumulative sum for each delta datapoint.
Another issue has been created to perform peridoic aggregation: #29461
Example configuration for the component
Telemetry data types supported
Metric
Is this a vendor-specific component?
Code Owner(s)
No response
Sponsor (optional)
@djaglowski
Additional context
No response
The text was updated successfully, but these errors were encountered: