-
Notifications
You must be signed in to change notification settings - Fork 14
Add Extended range selectors proposal #52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add Extended range selectors proposal #52
Conversation
Signed-off-by: Julien Pivotto <291750+roidelapluie@users.noreply.github.com>
Signed-off-by: Julien Pivotto <291750+roidelapluie@users.noreply.github.com>
Signed-off-by: Julien Pivotto <291750+roidelapluie@users.noreply.github.com>
Signed-off-by: Julien Pivotto <291750+roidelapluie@users.noreply.github.com>
compromise aggregability. Instead, by duplicating the first data point at the | ||
beginning of the range, we maintain accuracy without introducing artificial data calculations. | ||
|
||
For queries at the current time ("now"), we also deliberately avoid extrapolation at the end of the range. While this **will always** show a slight drop for metrics with constant rates and "smoothed", this accurately reflects that the data is incomplete at the range boundary. In future iterations, we could introduce a special annotation to indicate this boundary condition if users find it problematic in practice. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
smoothed
works fine in the middle of a time series (i.e. for historical queries). But the fact that it will produce different results in a recording rule vs later on (beyond the usual race conditions that may lead to a couple of missing samples every now and then at the time of rule evaluation) means that it is really only useful for charts and such.
I suppose that's fine, since it's an optional modifier, but I would consider strongly discouraging its use in rules. E.g. if you scrape instance="A"
with an offset of 1 second and instance="B"
with an offset of 14 seconds (on a scrape interval of 15 seconds), then increase(foo{instance="A"})
will be consistently ~30% higher than increase(foo{instance="B"})
for a foo
that increases at the same rate on both instances.
Also, metric smoothed
won't even produce an output in real time, since there's never a next sample at the time of evaluation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you are correct. In fact, rules would need a offset $scrape_interval
to work as correctly as possible, which is something we already see at some locations.
But interpolating would also mean that if we have no new samples and the target is down, we would again extrapolate "wrongly" and invent data that will never happen.
In the same way, I did not want to add heuristics around staleness marker because: 1. They do not mean that counter has stopped, just that we can't get data. 2. Push use cases (like otel or other monitoring systems) will not produce staleness markers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leaving aside the fact that it shifts the resulting timestamps, AFAICT the offset
approach is only a best-effort hack. In order for it to work reliably it would have to be something like offset <lookback_window>
. An offset <scrape_interval>
(or a multiple thereof; also a bit of a hack, BTW, because you'd be hardcoding the scrape interval in your query) breaks if you miss a scrape (or more). Whereas, even with a missing sample(s), the metric did actually experience an increase. So not producing an output under those conditions is very much incorrect behavior.
As for interpolation / extrapolation, I agree with you that they only make up misleading data. E.g. why should it be that a query like increase(foo[1ms])
would produce a non-zero output when foo
has orders of magnitude lower resolution? How can anyone tell that the underlying metric actually increased smoothly within that 1 ms? (Vs it staying flat or experiencing the full increase all at once.)
IMHO, one should embrace the fact that the data has the resolution that it does, not whitewash it for the sake of smooth lines on a chart (while at the same time, not generating weird artifacts at the edges or around step increases, as the current implementation does).
- At both the start and end of the range (or evaluation time, for instant | ||
selectors), Prometheus will search **backward** using a lookback window to find | ||
the sample just before the boundary. | ||
- If there is no **previous sample**, we duplicate the first sample in the range at the start of the range. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually duplicating the previous sample at range boundary has the side-effect of losing timestamp information. This makes no difference for increase()
(since it's simply the difference between the samples' values, regardless of timestamp).
But rate()
will be negatively affected by essentially changing the timestamps of the samples at the two ends. Given a few missing samples (or a range that is not a multiple of the scrape interval) the rate()
of a time series that increases at a constant rate will jump around (because one of both ends get "stuck", while rate()
continues to divide the jittery increase by a fixed time interval). This would not be the case if you retained the two samples' original timestamps.
IOW, increase()
without interpolation will inevitably jitter whenever you have one or more missing samples at your range end. rate()
doesn't have to.
Side note (and personal point of view): For me, the clearest way to see counters is (as in your data sets above) as a sequence of timestamped value deltas (because that's precisely what the code producing them does: "I've written 5 more bytes"). Unfortunately, due to the limitations of scraping, counters are represented as sequences of monotonically increasing values. But if you ignore the representation and focus strictly on the meaning, then it's a lot easier to reason about what increase(foo[1m])
or rate(foo[1m])
should be doing: sum all the value deltas in the given range (and divide them by the sum of their time deltas, in the case of rate()
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What you describe about rate
is correct but for now rate()
is the increase divided by the time. I believe that keeping the original timestamp would mean "lying" about what the user asks. User asks about the time for X minutes, therefore preserving the original timestamp would provide them with a "X+" minutes.
Overall, for rate and getting a better result, "smoothed" should definitively be used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As per the above, data has whatever resolution it has. So if in the last 5 minutes I have 3 increases covering a time span of 6 minutes (whether equally spaced or with gaps) what is the possible benefit of adding them up and dividing by 5 minutes?
IIRC the current implementation already does this: it tries to extrapolate at the ends, but if the samples are too far from the ends to extrapolate, then it will adjust the duration that it divides by accordingly. While nominally claiming to be increase divided by time. Because it's true. It's just that said time is not always the 5 minutes specified by the user, because you don't always have exactly 5 minutes worth of data. I don't see anything contentious about that.
Dividing the increase by the actual covered duration will give you just as smooth (and definitely less made up) a rate as smoothed
would. I.e. a "better result".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anchored is made for increase
and providing integer results.
Smoothed should work very well with rates, at the condition that there is a point after or at the boundary. This can be adjusted by using offset $scrape_interval in recording rules (or better, with prometheus/prometheus#10529 ).
We could "move" the 5m selector at the last datapoint provided by the user, but the issue is when summing all this, we would sum data moved by 5s. data moved by 15s, data moved by 1m... which would provide an inconsistent result.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[I am easily triggered and I don't like where this discussion is headed. So I'll leave this here and try my best to move along.]
How exactly is summing made up data (which is what interpolation gives you) better than summing data that is shifted by some time delta lower than the scrape interval? What if my counter is an actual step function?
And how is summing "smoothed" data shifted by a full scrape interval (when that happens to work) better than data shifted by less than a full scrape interval? (Because that's the data you have.) And if that is indeed a problem, what's stopping you from giving the resulting value the timestamp of the last sample, so you get zero time shift and zero hallucinated data?
All values output by Prometheus are shifted by some time delta less than a full scrape interval (if everything works perfectly). Why is this suddenly different? Did anyone ever consider smoothing gauges, as an improvement over the status quo? If not, why not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Smoothing at either end only gives you absurd results, such as "2.375 process restarts over the past 5 minutes".
And desperately hanging on to a fixed time range to divide by, will lead to arbitrarily stretching or compressing the (anchored
) rate whenever samples are missing or not perfectly equally spaced (consider the example of a counter with an actual rate of increase of 1, but with somewhat randomly spaced samples).
All you can do at that point is to religiously stick with increase(... anchored)
and rate(... smoothed)
to try and salvage something half-way sane.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we used the original sample timestamps as the range to divide by, what should happen if there aren't any values within the lookback window? Would it be better to use the original timestamps of the first and last samples in the range, or the duration of the range vector selector?
Additionally, I think having rate() always being equal to increase() divided by the duration of the range vector selector, no matter what modifiers are used, is be a better user experience, as it's consistent so less surprising.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additionally, I think having rate() always being equal to increase() divided by the duration of the range vector selector, no matter what modifiers are used, is be a better user experience, as it's consistent so less surprising.
I also think it is very important to keep this behavior consistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
About the more general question: Different users are asking different questions, obviously, but I think that the most "reasonable" use case for anchored
is better served with a behavior where we return no result if there is no previous sample. (That's what users get from the [1m:1m]
subquery work-around we have seen so often.)
My biggest worry is that users asking for an increase over a long time (e.g. one month) might unknowingly get an increase over a much shorter time (e.g. an hour) if the metric they are querying has only started a short time ago.
On the other hand, I can see how it would make sense if we assume "absence of samples" means "no change", and in scenarios where many time series that are coming and going are aggregated, it might make sense… but maybe that's then a case where smoothed
is better anyway…
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe we should return results even when there are no samples before the range.
Consider this example:
sum by (instance, job) (increase(http_requests_total[10m] anchored))
If we require a sample before the start, this would exclude all new time series—such as those with a dynamic label like path
or handler
. That would lead to silently discarding valid data, which I find very problematic.
This is exactly where created timestamps (CT) would be valuable: we would move the 0 value at the beginning of the range.
If we don’t return anything in these cases, we’re effectively cutting out a significant portion of practical use cases—particularly those involving new or dynamic labels.
|
||
### How This Proposal Works | ||
|
||
This proposal introduces two new **modifiers for range and instant selectors** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you elaborate on why the new modifiers are necessary for instant vector selectors? It seems from the motivation above and discussion below the new modifiers are only necessary for range vector selectors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello,
They are not necessary and we can skip them.
smoothing
could be useful to get a better estimation of gauges between two scrape intervals, but it is not necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. Perhaps we can start with adding these keywords just for range vector selectors for now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally, I like to generalize the keywords for instant vectors, too. I believe that the fundamental difference how samples are selected for instant and range vectors is one source of confusion. Arguably, instant vectors were already anchored
. Introducing smoothed
for both might create more clarity.
Signed-off-by: Julien Pivotto <291750+roidelapluie@users.noreply.github.com>
Signed-off-by: Julien Pivotto <291750+roidelapluie@users.noreply.github.com>
Hi @roidelapluie, I fail to see how this handle the following usecase (see prometheus/prometheus#3806): trivy operator provides a metric for vulnerabilities of a given Kubernetes pod in a given namespace by severity. For a new pod with vulnerabilities, the metric appears with the namespace,pod,severity labels and a non-zero value. How to have an alert for new vulnerabilities (probably per-namespace) with the new semantics? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you very much for doing this.
I generally like it. I'm not so sure about some edge cases, but we can go forward with an implementation and see how it works in practice. (It will be an experimental feature for a while, I assume, so we can iterate on it.)
I see in your Q&A that you decided to ignore staleness markers. Maybe you are right. I'm not so sure, but see above. I would still recommend to mention the fact that we plan to ignore staleness markers more prominently (where I added comments asking what to do with staleness markers).
The big remaining question is if anchored
should work if there is no sample before the range. I was originally thinking it should only return anything if there is actually a sample before the range (or precisely at the beginning of the range). After reading everything, I'm not so sure anymore. I can imagine that the use case I'm often mentioning ("Give my disk usage growth over the last month" – and then I get the growth over the last hour without noticing) will mostly affect gauges, so foo - foo offset 30d
will work just fine.
Essentially, I'm willing to give it a go pretty much precisely as you propose it here.
- **Define a one-size-fits-all solution**: The goal is not to find a universally | ||
"better" rate function that replaces all others. Different use cases have | ||
inherently different trade-offs, and the proposal embraces offering multiple | ||
well-defined modes instead of forcing consensus around a single behavior. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the other hand, providing a whole zoo of different modifier for incrementally different use cases will be very confusing.
How about aiming for a good trade-off?
(Personally, I'm hopeful that the two modifiers introduced here might actually do the trick, with smoothed
being the "actually better rate calculation" and anchored
providing the "direct difference between two sample values without adjustment for the range length", where I see these two cases as rather different and not just incremental changes of user expectations.)
|
||
### How This Proposal Works | ||
|
||
This proposal introduces two new **modifiers for range and instant selectors** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally, I like to generalize the keywords for instant vectors, too. I believe that the fundamental difference how samples are selected for instant and range vectors is one source of confusion. Arguably, instant vectors were already anchored
. Introducing smoothed
for both might create more clarity.
*Anchored vectors ensure boundary completeness by including real samples at range boundaries. The diagram shows how a sample just before the range start (outside the original range) is included, and the last sample at range end is duplicated at the end of the range, making sure there are points to cover the complete interval [start, end].* | ||
|
||
**Behavior**: | ||
- This mode treats the selector range as **closed on both ends**: `[start, end]`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this statement is confusing, is we are even considering samples "at or before". "Closed" interval at the lift end would imply "at or after".
- At both the start and end of the range (or evaluation time, for instant | ||
selectors), Prometheus will search **backward** using a lookback window to find | ||
the sample just before the boundary. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will stalenessmarkers be taken into account here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, as my research has proven that staleness marker do not indicate anything. It could be a restart, a temporary network issue.. if there is a staleness marker, we hope to find the value before it to determine if we are at the continuation or if this is a reset.
- This mode estimates the values at the range’s start and end (or instant | ||
evaluation time) by **interpolating** between the nearest known samples **before | ||
and after** each boundary. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are staleness markers taken into account?
and after** each boundary. | ||
- If there is no datapoint before the range, the first datapoint inside of the range is duplicated at the beginning of the range. | ||
- If there is no datapoint after the range, the last datapoint inside of the range is duplicated at the end of the range. | ||
- The interpolation window is controlled by the **lookback delta**, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe mention that this could even work with no data points inside the range.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think we want that?
Co-authored-by: Björn Rabenstein <github@rabenste.in> Signed-off-by: Julien <291750+roidelapluie@users.noreply.github.com>
Co-authored-by: Björn Rabenstein <github@rabenste.in> Signed-off-by: Julien <291750+roidelapluie@users.noreply.github.com>
If you want to know how many HTTP requests you have over 1h:
ignoring metrics without samples before the range would drop a lot of results (all newly created labels, which could be e.g. paths. so it would completely underestimate the result. The ideal thing is CT: we insert a 0 and so the 0 for newly created labels is reported at the beginning of the range, so we do not miss anything. |
Co-authored-by: Björn Rabenstein <github@rabenste.in> Signed-off-by: Julien <291750+roidelapluie@users.noreply.github.com>
Co-authored-by: Björn Rabenstein <github@rabenste.in> Signed-off-by: Julien <291750+roidelapluie@users.noreply.github.com>
Thanks for addressing my comments. I'm still caught between several events at work and in my usual life, but I will finally be able to tackle my review backlog next week, which will allow me to address the open questions. In the meantime, it would be good to get some feedback from other interested Prometheus developers. I would much prefer more approvals on this than just mine. |
No description provided.