Skip to content

Add Extended range selectors proposal #52

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

roidelapluie
Copy link
Member

No description provided.

Signed-off-by: Julien Pivotto <291750+roidelapluie@users.noreply.github.com>
Signed-off-by: Julien Pivotto <291750+roidelapluie@users.noreply.github.com>
Signed-off-by: Julien Pivotto <291750+roidelapluie@users.noreply.github.com>
Signed-off-by: Julien Pivotto <291750+roidelapluie@users.noreply.github.com>
Signed-off-by: Julien Pivotto <291750+roidelapluie@users.noreply.github.com>
compromise aggregability. Instead, by duplicating the first data point at the
beginning of the range, we maintain accuracy without introducing artificial data calculations.

For queries at the current time ("now"), we also deliberately avoid extrapolation at the end of the range. While this **will always** show a slight drop for metrics with constant rates and "smoothed", this accurately reflects that the data is incomplete at the range boundary. In future iterations, we could introduce a special annotation to indicate this boundary condition if users find it problematic in practice.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

smoothed works fine in the middle of a time series (i.e. for historical queries). But the fact that it will produce different results in a recording rule vs later on (beyond the usual race conditions that may lead to a couple of missing samples every now and then at the time of rule evaluation) means that it is really only useful for charts and such.

I suppose that's fine, since it's an optional modifier, but I would consider strongly discouraging its use in rules. E.g. if you scrape instance="A" with an offset of 1 second and instance="B" with an offset of 14 seconds (on a scrape interval of 15 seconds), then increase(foo{instance="A"}) will be consistently ~30% higher than increase(foo{instance="B"}) for a foo that increases at the same rate on both instances.

Also, metric smoothed won't even produce an output in real time, since there's never a next sample at the time of evaluation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you are correct. In fact, rules would need a offset $scrape_interval to work as correctly as possible, which is something we already see at some locations.

But interpolating would also mean that if we have no new samples and the target is down, we would again extrapolate "wrongly" and invent data that will never happen.

In the same way, I did not want to add heuristics around staleness marker because: 1. They do not mean that counter has stopped, just that we can't get data. 2. Push use cases (like otel or other monitoring systems) will not produce staleness markers.

Copy link

@free free May 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leaving aside the fact that it shifts the resulting timestamps, AFAICT the offset approach is only a best-effort hack. In order for it to work reliably it would have to be something like offset <lookback_window>. An offset <scrape_interval> (or a multiple thereof; also a bit of a hack, BTW, because you'd be hardcoding the scrape interval in your query) breaks if you miss a scrape (or more). Whereas, even with a missing sample(s), the metric did actually experience an increase. So not producing an output under those conditions is very much incorrect behavior.

As for interpolation / extrapolation, I agree with you that they only make up misleading data. E.g. why should it be that a query like increase(foo[1ms]) would produce a non-zero output when foo has orders of magnitude lower resolution? How can anyone tell that the underlying metric actually increased smoothly within that 1 ms? (Vs it staying flat or experiencing the full increase all at once.)

IMHO, one should embrace the fact that the data has the resolution that it does, not whitewash it for the sake of smooth lines on a chart (while at the same time, not generating weird artifacts at the edges or around step increases, as the current implementation does).

Comment on lines +211 to +214
- At both the start and end of the range (or evaluation time, for instant
selectors), Prometheus will search **backward** using a lookback window to find
the sample just before the boundary.
- If there is no **previous sample**, we duplicate the first sample in the range at the start of the range.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually duplicating the previous sample at range boundary has the side-effect of losing timestamp information. This makes no difference for increase() (since it's simply the difference between the samples' values, regardless of timestamp).

But rate() will be negatively affected by essentially changing the timestamps of the samples at the two ends. Given a few missing samples (or a range that is not a multiple of the scrape interval) the rate() of a time series that increases at a constant rate will jump around (because one of both ends get "stuck", while rate() continues to divide the jittery increase by a fixed time interval). This would not be the case if you retained the two samples' original timestamps.

IOW, increase() without interpolation will inevitably jitter whenever you have one or more missing samples at your range end. rate() doesn't have to.

Side note (and personal point of view): For me, the clearest way to see counters is (as in your data sets above) as a sequence of timestamped value deltas (because that's precisely what the code producing them does: "I've written 5 more bytes"). Unfortunately, due to the limitations of scraping, counters are represented as sequences of monotonically increasing values. But if you ignore the representation and focus strictly on the meaning, then it's a lot easier to reason about what increase(foo[1m]) or rate(foo[1m]) should be doing: sum all the value deltas in the given range (and divide them by the sum of their time deltas, in the case of rate()).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What you describe about rate is correct but for now rate() is the increase divided by the time. I believe that keeping the original timestamp would mean "lying" about what the user asks. User asks about the time for X minutes, therefore preserving the original timestamp would provide them with a "X+" minutes.

Overall, for rate and getting a better result, "smoothed" should definitively be used.

Copy link

@free free May 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per the above, data has whatever resolution it has. So if in the last 5 minutes I have 3 increases covering a time span of 6 minutes (whether equally spaced or with gaps) what is the possible benefit of adding them up and dividing by 5 minutes?

IIRC the current implementation already does this: it tries to extrapolate at the ends, but if the samples are too far from the ends to extrapolate, then it will adjust the duration that it divides by accordingly. While nominally claiming to be increase divided by time. Because it's true. It's just that said time is not always the 5 minutes specified by the user, because you don't always have exactly 5 minutes worth of data. I don't see anything contentious about that.

Dividing the increase by the actual covered duration will give you just as smooth (and definitely less made up) a rate as smoothed would. I.e. a "better result".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anchored is made for increase and providing integer results.

Smoothed should work very well with rates, at the condition that there is a point after or at the boundary. This can be adjusted by using offset $scrape_interval in recording rules (or better, with prometheus/prometheus#10529 ).

We could "move" the 5m selector at the last datapoint provided by the user, but the issue is when summing all this, we would sum data moved by 5s. data moved by 15s, data moved by 1m... which would provide an inconsistent result.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[I am easily triggered and I don't like where this discussion is headed. So I'll leave this here and try my best to move along.]

How exactly is summing made up data (which is what interpolation gives you) better than summing data that is shifted by some time delta lower than the scrape interval? What if my counter is an actual step function?

And how is summing "smoothed" data shifted by a full scrape interval (when that happens to work) better than data shifted by less than a full scrape interval? (Because that's the data you have.) And if that is indeed a problem, what's stopping you from giving the resulting value the timestamp of the last sample, so you get zero time shift and zero hallucinated data?

All values output by Prometheus are shifted by some time delta less than a full scrape interval (if everything works perfectly). Why is this suddenly different? Did anyone ever consider smoothing gauges, as an improvement over the status quo? If not, why not?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Smoothing at either end only gives you absurd results, such as "2.375 process restarts over the past 5 minutes".

And desperately hanging on to a fixed time range to divide by, will lead to arbitrarily stretching or compressing the (anchored) rate whenever samples are missing or not perfectly equally spaced (consider the example of a counter with an actual rate of increase of 1, but with somewhat randomly spaced samples).

All you can do at that point is to religiously stick with increase(... anchored) and rate(... smoothed) to try and salvage something half-way sane.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we used the original sample timestamps as the range to divide by, what should happen if there aren't any values within the lookback window? Would it be better to use the original timestamps of the first and last samples in the range, or the duration of the range vector selector?

Additionally, I think having rate() always being equal to increase() divided by the duration of the range vector selector, no matter what modifiers are used, is be a better user experience, as it's consistent so less surprising.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additionally, I think having rate() always being equal to increase() divided by the duration of the range vector selector, no matter what modifiers are used, is be a better user experience, as it's consistent so less surprising.

I also think it is very important to keep this behavior consistent.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About the more general question: Different users are asking different questions, obviously, but I think that the most "reasonable" use case for anchored is better served with a behavior where we return no result if there is no previous sample. (That's what users get from the [1m:1m] subquery work-around we have seen so often.)
My biggest worry is that users asking for an increase over a long time (e.g. one month) might unknowingly get an increase over a much shorter time (e.g. an hour) if the metric they are querying has only started a short time ago.
On the other hand, I can see how it would make sense if we assume "absence of samples" means "no change", and in scenarios where many time series that are coming and going are aggregated, it might make sense… but maybe that's then a case where smoothed is better anyway…

Copy link
Member Author

@roidelapluie roidelapluie May 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we should return results even when there are no samples before the range.

Consider this example:

sum by (instance, job) (increase(http_requests_total[10m] anchored))

If we require a sample before the start, this would exclude all new time series—such as those with a dynamic label like path or handler. That would lead to silently discarding valid data, which I find very problematic.

This is exactly where created timestamps (CT) would be valuable: we would move the 0 value at the beginning of the range.

If we don’t return anything in these cases, we’re effectively cutting out a significant portion of practical use cases—particularly those involving new or dynamic labels.


### How This Proposal Works

This proposal introduces two new **modifiers for range and instant selectors**

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you elaborate on why the new modifiers are necessary for instant vector selectors? It seems from the motivation above and discussion below the new modifiers are only necessary for range vector selectors.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello,

They are not necessary and we can skip them.

smoothing could be useful to get a better estimation of gauges between two scrape intervals, but it is not necessary.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. Perhaps we can start with adding these keywords just for range vector selectors for now?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally, I like to generalize the keywords for instant vectors, too. I believe that the fundamental difference how samples are selected for instant and range vectors is one source of confusion. Arguably, instant vectors were already anchored. Introducing smoothed for both might create more clarity.

Signed-off-by: Julien Pivotto <291750+roidelapluie@users.noreply.github.com>
Signed-off-by: Julien Pivotto <291750+roidelapluie@users.noreply.github.com>
@sathieu
Copy link

sathieu commented May 6, 2025

Hi @roidelapluie, I fail to see how this handle the following usecase (see prometheus/prometheus#3806): trivy operator provides a metric for vulnerabilities of a given Kubernetes pod in a given namespace by severity. For a new pod with vulnerabilities, the metric appears with the namespace,pod,severity labels and a non-zero value. How to have an alert for new vulnerabilities (probably per-namespace) with the new semantics?

Copy link
Member

@beorn7 beorn7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much for doing this.

I generally like it. I'm not so sure about some edge cases, but we can go forward with an implementation and see how it works in practice. (It will be an experimental feature for a while, I assume, so we can iterate on it.)

I see in your Q&A that you decided to ignore staleness markers. Maybe you are right. I'm not so sure, but see above. I would still recommend to mention the fact that we plan to ignore staleness markers more prominently (where I added comments asking what to do with staleness markers).

The big remaining question is if anchored should work if there is no sample before the range. I was originally thinking it should only return anything if there is actually a sample before the range (or precisely at the beginning of the range). After reading everything, I'm not so sure anymore. I can imagine that the use case I'm often mentioning ("Give my disk usage growth over the last month" – and then I get the growth over the last hour without noticing) will mostly affect gauges, so foo - foo offset 30d will work just fine.

Essentially, I'm willing to give it a go pretty much precisely as you propose it here.

Comment on lines +165 to +168
- **Define a one-size-fits-all solution**: The goal is not to find a universally
"better" rate function that replaces all others. Different use cases have
inherently different trade-offs, and the proposal embraces offering multiple
well-defined modes instead of forcing consensus around a single behavior.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the other hand, providing a whole zoo of different modifier for incrementally different use cases will be very confusing.
How about aiming for a good trade-off?

(Personally, I'm hopeful that the two modifiers introduced here might actually do the trick, with smoothed being the "actually better rate calculation" and anchored providing the "direct difference between two sample values without adjustment for the range length", where I see these two cases as rather different and not just incremental changes of user expectations.)


### How This Proposal Works

This proposal introduces two new **modifiers for range and instant selectors**
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally, I like to generalize the keywords for instant vectors, too. I believe that the fundamental difference how samples are selected for instant and range vectors is one source of confusion. Arguably, instant vectors were already anchored. Introducing smoothed for both might create more clarity.

*Anchored vectors ensure boundary completeness by including real samples at range boundaries. The diagram shows how a sample just before the range start (outside the original range) is included, and the last sample at range end is duplicated at the end of the range, making sure there are points to cover the complete interval [start, end].*

**Behavior**:
- This mode treats the selector range as **closed on both ends**: `[start, end]`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this statement is confusing, is we are even considering samples "at or before". "Closed" interval at the lift end would imply "at or after".

Comment on lines +210 to +212
- At both the start and end of the range (or evaluation time, for instant
selectors), Prometheus will search **backward** using a lookback window to find
the sample just before the boundary.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will stalenessmarkers be taken into account here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, as my research has proven that staleness marker do not indicate anything. It could be a restart, a temporary network issue.. if there is a staleness marker, we hope to find the value before it to determine if we are at the continuation or if this is a reset.

Comment on lines +258 to +260
- This mode estimates the values at the range’s start and end (or instant
evaluation time) by **interpolating** between the nearest known samples **before
and after** each boundary.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are staleness markers taken into account?

and after** each boundary.
- If there is no datapoint before the range, the first datapoint inside of the range is duplicated at the beginning of the range.
- If there is no datapoint after the range, the last datapoint inside of the range is duplicated at the end of the range.
- The interpolation window is controlled by the **lookback delta**,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe mention that this could even work with no data points inside the range.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think we want that?

roidelapluie and others added 2 commits May 12, 2025 11:04
Co-authored-by: Björn Rabenstein <github@rabenste.in>
Signed-off-by: Julien <291750+roidelapluie@users.noreply.github.com>
Co-authored-by: Björn Rabenstein <github@rabenste.in>
Signed-off-by: Julien <291750+roidelapluie@users.noreply.github.com>
@roidelapluie
Copy link
Member Author

The big remaining question is if anchored should work if there is no sample before the range. I was originally thinking it should only return anything if there is actually a sample before the range (or precisely at the beginning of the range). After reading everything, I'm not so sure anymore. I can imagine that the use case I'm often mentioning ("Give my disk usage growth over the last month" – and then I get the growth over the last hour without noticing) will mostly affect gauges, so foo - foo offset 30d will work just fine.

If you want to know how many HTTP requests you have over 1h:

sum(increase(http_requests_total[1h] anchored))

ignoring metrics without samples before the range would drop a lot of results (all newly created labels, which could be e.g. paths. so it would completely underestimate the result.

The ideal thing is CT: we insert a 0 and so the 0 for newly created labels is reported at the beginning of the range, so we do not miss anything.

roidelapluie and others added 2 commits May 13, 2025 15:52
Co-authored-by: Björn Rabenstein <github@rabenste.in>
Signed-off-by: Julien <291750+roidelapluie@users.noreply.github.com>
Co-authored-by: Björn Rabenstein <github@rabenste.in>
Signed-off-by: Julien <291750+roidelapluie@users.noreply.github.com>
@beorn7
Copy link
Member

beorn7 commented May 29, 2025

Thanks for addressing my comments. I'm still caught between several events at work and in my usual life, but I will finally be able to tackle my review backlog next week, which will allow me to address the open questions.

In the meantime, it would be good to get some feedback from other interested Prometheus developers. I would much prefer more approvals on this than just mine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants