-
Notifications
You must be signed in to change notification settings - Fork 820
exemplar querying #4181
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
exemplar querying #4181
Conversation
Signed-off-by: Callum Styan <callumstyan@gmail.com>
result = append(result, b[j]) | ||
j++ | ||
} else { | ||
result = append(result, a[i]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure, but should this compare Value
as well? The Prometheus dedupe logic does so I think different exemplars with the same Ts but different Values are possible. https://github.com/prometheus/prometheus/blob/main/pkg/exemplar/exemplar.go#L49
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is that any data that comes into cortex is replicated to the right ingester replica set, and the write is successful when we're replicated to a quorum (2 out of 3 in our case).
So my assumption is that while not every ingester in the replica set would have every exemplar, there shouldn't be any ingester that has an exemplar at a certain timestamp that doesn't exist in another ingester. That is, because we're not scraping and (potentially) setting the timestamps ourselves within cortex, merging by timestamp should be enough.
Anything I'm missing here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So my assumption is that while not every ingester in the replica set would have every exemplar, there shouldn't be any ingester that has an exemplar at a certain timestamp that doesn't exist in another ingester.
Think about this edge case:
- Remote write exemplar for series X with TS=1 and V=1. Replication set is ingesters 1,2,3. This write succeed towards ingester 1 and 2, but not 3 (not ingested here).
- Remote write exemplar for series X with TS=1 and V=2. Replication set is ingesters 1,2,3. This write succeed toward ingester 3 (was not ingested before) but fails on 1 and 2 (because of same timestamp but different value). Even if quorum has not been reached, the exemplar with V=2 is written anyway to ingester 3 (will not rollback from ingester 3 because writing to 1 and 2 has failed).
When you read back, you will have two exemplars with TS=1 and values 1 and 2.
The current implementation picks a random exemplar (between the two) if the timestamp is the same, without checking if value is equal as well. I personally think it's fine, but I wanted to outline the edge case above so you can take an informed decision.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does Cortex handle this situation for samples? I think exemplars are handled the same way here since this is essentially a copy of util.MergeSampleSets
.
The only way this edge case could really happen that I know of is bad relabel configs client side, resulting in series from different Prometheus instances to result in the same series within Cortex. In Prometheus' exemplar storage we rejects duplicate exemplars (description of what is considered a duplicate), and do the same in Cortex since we're just using Prometheus' exemplar storage currently.
Does this make sense, or am I missing something? cc @mdisibio
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think exemplars are handled the same way here since this is essentially a copy of util.MergeSampleSets
Thinking loudly:
util.MergeSampleSets
is just used when gRPC streaming between queriers and ingesters is disabled (it's enabled by default).- When gRPC streaming is enabled and chunks transferring is disabled, we use
mergeSamples()
which doesn't compare samples value. - When both gRPC streaming and chunks transferring is enabled the ingester then
distributorQuerier.streamingSelect()
returnsseries.NewConcreteSeriesSet()
.NewConcreteSeriesSet()
iterates over series stored aschunkSeries
so chunks are iterated bychunkSeries
.chunkSeries.Iterator()
uses the configuredchunkIteratorFunc
, so the behaviour depends on the configured chunk iteration function. The iteration function is returned bygetChunksIteratorFunction()
. Let's assume batch iteration is used (we do), thenbatch.NewChunkMergeIterator()
is used. The deduplication is done bymergeStreams()
which, in case of same timestamp, it just picks one of the two values without checking the actual value
So in all cases, I believe samples deduplication is equal to your mergeExemplarSets()
function 👍
Given the time it took this analysis, what do you think adding a test case to mergeExemplarSets()
to assert on the case "same timestamp but different value" so we write it into the stone tests? :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks again for the detailed explanation.
pkg/distributor/query.go
Outdated
return err | ||
} | ||
|
||
replicationSet, err := d.GetIngestersForQuery(ctx, false, nil) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be good to clarify how all matchers are being sent to all ingesters for the query, even though on ingest they are sharded by series (i.e. it is expected that an ingester will satisfy only a subset of the matchers and the results will be merged). Maybe some comments, or a different name for the exemplarQuery
parameter would be enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree a comment would be great.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, I will add a comment.
For my own understanding, is this only true for shuffle sharding? even though on ingest they are sharded by series
. I haven't followed shuffle sharding much. My assumption has always been that any data is replicated to all the ingesters for that tenant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm also now thinking of whether we should just split this up more, getting the replica set for each set of matchers and running the exemplar queries with those, and then merging the results.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm also now thinking of whether we should just split this up more, getting the replica set for each set of matchers and running the exemplar queries with those, and then merging the results.
I don't think it's worth the effort. Let me explain why.
Regardless of shuffle-sharding (which is about tenants sharding, not series), series can be sharded across ingesters into two ways:
- By metric name only
- By all series
When sharding by all series you have to query all ingesters, while when sharding by metric name all series for a given metric name are sharded always to the same ingester (+ replicas). That's why we can restrict the ingesters to query from when "shard by metric name only" is used.
However, the current implementation of "query only a restricted set of ingesters when shard by metric name only is enabled" is also buggy and doesn't take in account reshardings (eg. scale up), so it's not even safe to use out of the box with the blocks storage.
Long story short, I think it's fine (and desired) to always query all ingesters for the exemplars. When shuffle-sharding is enabled, it will only query ingesters belonging to the tenant's shard (and that's OK).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pracucci Thanks for the explanation! So in either scenario there's never 3 ingesters that have all of a given tenants data? A set of three would have a subset of their data based on the sharding type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I understand your question. Could you elaborate more your question, please?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Solid job @cstyan! Didn't find any issue. I left few minor comments I would be glad if you could take a look. Waiting for tests!
Could you also document the new API endpoints in docs/api/_index.md
, please?
pkg/api/handlers.go
Outdated
@@ -254,6 +256,8 @@ func NewQuerierHandler( | |||
router.Path(path.Join(legacyPrefix, "/api/v1/read")).Methods("POST").Handler(legacyPromRouter) | |||
router.Path(path.Join(legacyPrefix, "/api/v1/query")).Methods("GET", "POST").Handler(legacyPromRouter) | |||
router.Path(path.Join(legacyPrefix, "/api/v1/query_range")).Methods("GET", "POST").Handler(legacyPromRouter) | |||
// unclear to me whether we need to register here |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's keep it and then we will address this while router.Path()
group in a separate PR, considering prometheus/prometheus#7125 has been merged.
we will address this while
router.Path()
group in a separate PR
Could you open an issue about it, please?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
created: #4213
pkg/distributor/query.go
Outdated
return err | ||
} | ||
|
||
replicationSet, err := d.GetIngestersForQuery(ctx, false, nil) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree a comment would be great.
@@ -53,6 +54,32 @@ func (d *Distributor) Query(ctx context.Context, from, to model.Time, matchers . | |||
return matrix, err | |||
} | |||
|
|||
func (d *Distributor) QueryExemplars(ctx context.Context, from, to model.Time, matchers ...[]*labels.Matcher) (*ingester_client.QueryResponse, error) { | |||
var result *ingester_client.QueryResponse | |||
err := instrument.CollectedRequest(ctx, "Distributor.QueryExemplars", d.queryDuration, instrument.ErrorCode, func(ctx context.Context) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're reusing d.queryDuration
duration here. I think it's fine, but let's see how the other discussion about metrics evolve.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any thoughts here? we haven't discussed the rest of the metrics recently
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally think it's fine starting tracking exemplar queries metrics along with "samples query" metrics to begin with (we're doing the same in ingester at query time). We may reconsider it as a future improvement, but I wouldn't block on this (I'm pretty sure we'll refine the exemplars implementation while learning more running it in production, as it happens for every feature we build).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a followup to this it may be worth updating the help text for queryDuration. Currently it's:
Time spent executing expression queries.
It may be worth changing it to
Time spent executing expression and exemplar queries.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
docs/api/_index.md
Outdated
|
||
Prometheus-compatible exemplar query endpoint. When the request is sent through the query-frontend, the query will be accelerated by query-frontend (results caching and execution parallelisation). | ||
|
||
_For more information, please check out the Prometheus [range query](https://prometheus.io/docs/prometheus/latest/querying/api/#range-queries) documentation._ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's link to the right one.
_For more information, please check out the Prometheus [range query](https://prometheus.io/docs/prometheus/latest/querying/api/#range-queries) documentation._ | |
_For more information, please check out the Prometheus [querying exemplars](https://prometheus.io/docs/prometheus/latest/querying/api/#querying-exemplars) documentation._ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤦 fixed
docs/api/_index.md
Outdated
GET,POST <legacy-http-prefix>/api/v1/query_exemplars | ||
``` | ||
|
||
Prometheus-compatible exemplar query endpoint. When the request is sent through the query-frontend, the query will be accelerated by query-frontend (results caching and execution parallelisation). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The query-frontend doesn't accelerate exemplars endpoint.
Prometheus-compatible exemplar query endpoint. When the request is sent through the query-frontend, the query will be accelerated by query-frontend (results caching and execution parallelisation). | |
Prometheus-compatible exemplar query endpoint. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't we cache results?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The /api/v1/query_exemplars
is not supported by query-frontend (at least not yet) so it's not getting accelerated. It's definitely something we can do, but it hasn't been done yet. Am I missing anything?
return nil, err | ||
} | ||
|
||
i.metrics.queries.Inc() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if Cortex has a 'metrics label changes are breaking changes' policy.
The breaking changes policy doesn't cover metrics so no policy blocker here. However, I would keep it simple in this PR and not add an extra label. It's something we can reconsider in the future IMO.
Reason why I would keep it simple is because if we add a label here we should probably do the same in other places (eg. query duration tracked by Distributor.QueryExemplars()
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @cstyan ! The PR LGTM! I left few last minor comments. Could you also add at least a unit test to cover the querying part?
result = append(result, b[j]) | ||
j++ | ||
} else { | ||
result = append(result, a[i]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So my assumption is that while not every ingester in the replica set would have every exemplar, there shouldn't be any ingester that has an exemplar at a certain timestamp that doesn't exist in another ingester.
Think about this edge case:
- Remote write exemplar for series X with TS=1 and V=1. Replication set is ingesters 1,2,3. This write succeed towards ingester 1 and 2, but not 3 (not ingested here).
- Remote write exemplar for series X with TS=1 and V=2. Replication set is ingesters 1,2,3. This write succeed toward ingester 3 (was not ingested before) but fails on 1 and 2 (because of same timestamp but different value). Even if quorum has not been reached, the exemplar with V=2 is written anyway to ingester 3 (will not rollback from ingester 3 because writing to 1 and 2 has failed).
When you read back, you will have two exemplars with TS=1 and values 1 and 2.
The current implementation picks a random exemplar (between the two) if the timestamp is the same, without checking if value is equal as well. I personally think it's fine, but I wanted to outline the edge case above so you can take an informed decision.
pkg/ingester/ingester_v2.go
Outdated
return &client.ExemplarQueryResponse{}, nil | ||
} | ||
|
||
// Note that currently Prometheus' exemplar querier does nothing with a context that you pass it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nit] I would remove this comment just because these kind of comments age very badly :) If ctx will be used at some point by Prometheus, we'll very likely forget to remove this comment, but this comment can still deceive the reader/contributor.
pkg/ingester/ingester_v2.go
Outdated
// TODO should we update this series metric again? | ||
// i.metrics.queriedSeries.Observe(float64(len(result.Timeseries))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking more about this, I believe we shouldn't. Because of this, I would remove this commented code.
pkg/ingester/metrics.go
Outdated
queriedExemplars: promauto.With(r).NewHistogram(prometheus.HistogramOpts{ | ||
Name: "cortex_ingester_queried_exemplars", | ||
Help: "The total number of exemplars returned from queries.", | ||
// TODO: think about buckets, guessing here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To my understanding, max bucket is 1*(5^(5-1)) = 625. Maybe a bit low?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've bumped the start to 10
, this would give us buckets of 10, 50, 250, 1250, 6250
. Let me know if you think this is reasonable. The likely hood of the exemplar in memory storage having very few exemplars per series is high. We could bump the # of buckets to 8 like the samples buckets, that would give us 10, 50, 250, 1250, 6250, 31250, 156250, 781250
.
FWIW, out of the box, when you enable Prometheus exemplar storage, it has a max circular buffer size of 100k exemplars.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 I would remove the TODO.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay, on-call has been noisy this week.
Updated the ingester push test (which also calls the query function for samples) to call the query exemplars function and confirm exemplars that are stored properly (we only had 1 test case for storing exemplars in that test), hadn't pushed that yet so here it is.
Addressed a few other small comments + replied to some, mostly metrics related.
docs/api/_index.md
Outdated
GET,POST <legacy-http-prefix>/api/v1/query_exemplars | ||
``` | ||
|
||
Prometheus-compatible exemplar query endpoint. When the request is sent through the query-frontend, the query will be accelerated by query-frontend (results caching and execution parallelisation). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't we cache results?
docs/api/_index.md
Outdated
|
||
Prometheus-compatible exemplar query endpoint. When the request is sent through the query-frontend, the query will be accelerated by query-frontend (results caching and execution parallelisation). | ||
|
||
_For more information, please check out the Prometheus [range query](https://prometheus.io/docs/prometheus/latest/querying/api/#range-queries) documentation._ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤦 fixed
@@ -53,6 +54,32 @@ func (d *Distributor) Query(ctx context.Context, from, to model.Time, matchers . | |||
return matrix, err | |||
} | |||
|
|||
func (d *Distributor) QueryExemplars(ctx context.Context, from, to model.Time, matchers ...[]*labels.Matcher) (*ingester_client.QueryResponse, error) { | |||
var result *ingester_client.QueryResponse | |||
err := instrument.CollectedRequest(ctx, "Distributor.QueryExemplars", d.queryDuration, instrument.ErrorCode, func(ctx context.Context) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any thoughts here? we haven't discussed the rest of the metrics recently
result = append(result, b[j]) | ||
j++ | ||
} else { | ||
result = append(result, a[i]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does Cortex handle this situation for samples? I think exemplars are handled the same way here since this is essentially a copy of util.MergeSampleSets
.
The only way this edge case could really happen that I know of is bad relabel configs client side, resulting in series from different Prometheus instances to result in the same series within Cortex. In Prometheus' exemplar storage we rejects duplicate exemplars (description of what is considered a duplicate), and do the same in Cortex since we're just using Prometheus' exemplar storage currently.
Does this make sense, or am I missing something? cc @mdisibio
return nil, err | ||
} | ||
|
||
i.metrics.queries.Inc() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would you prefer separate metrics altogether here?
pkg/ingester/metrics.go
Outdated
queriedExemplars: promauto.With(r).NewHistogram(prometheus.HistogramOpts{ | ||
Name: "cortex_ingester_queried_exemplars", | ||
Help: "The total number of exemplars returned from queries.", | ||
// TODO: think about buckets, guessing here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've bumped the start to 10
, this would give us buckets of 10, 50, 250, 1250, 6250
. Let me know if you think this is reasonable. The likely hood of the exemplar in memory storage having very few exemplars per series is high. We could bump the # of buckets to 8 like the samples buckets, that would give us 10, 50, 250, 1250, 6250, 31250, 156250, 781250
.
FWIW, out of the box, when you enable Prometheus exemplar storage, it has a max circular buffer size of 100k exemplars.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks! I left few final nits, all very small things. Please also take a look at failed lint and unit tests.
- [ENHANCEMENT] Blocks storage: support ingesting exemplars. Enabled by setting new CLI flag
-blocks-storage.tsdb.max-exemplars=<n>
or config optionblocks_storage.tsdb.max_exemplars
to positive value. Support ingesting exemplars into TSDB when blocks storage is enabled #4124
Could you update this CHANGELOG entry mentioning we support querying too + adding the PR number along side "#4124", please?
docs/api/_index.md
Outdated
GET,POST <legacy-http-prefix>/api/v1/query_exemplars | ||
``` | ||
|
||
Prometheus-compatible exemplar query endpoint. When the request is sent through the query-frontend, the query will be accelerated by query-frontend (results caching and execution parallelisation). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The /api/v1/query_exemplars
is not supported by query-frontend (at least not yet) so it's not getting accelerated. It's definitely something we can do, but it hasn't been done yet. Am I missing anything?
@@ -53,6 +54,32 @@ func (d *Distributor) Query(ctx context.Context, from, to model.Time, matchers . | |||
return matrix, err | |||
} | |||
|
|||
func (d *Distributor) QueryExemplars(ctx context.Context, from, to model.Time, matchers ...[]*labels.Matcher) (*ingester_client.QueryResponse, error) { | |||
var result *ingester_client.QueryResponse | |||
err := instrument.CollectedRequest(ctx, "Distributor.QueryExemplars", d.queryDuration, instrument.ErrorCode, func(ctx context.Context) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally think it's fine starting tracking exemplar queries metrics along with "samples query" metrics to begin with (we're doing the same in ingester at query time). We may reconsider it as a future improvement, but I wouldn't block on this (I'm pretty sure we'll refine the exemplars implementation while learning more running it in production, as it happens for every feature we build).
result = append(result, b[j]) | ||
j++ | ||
} else { | ||
result = append(result, a[i]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think exemplars are handled the same way here since this is essentially a copy of util.MergeSampleSets
Thinking loudly:
util.MergeSampleSets
is just used when gRPC streaming between queriers and ingesters is disabled (it's enabled by default).- When gRPC streaming is enabled and chunks transferring is disabled, we use
mergeSamples()
which doesn't compare samples value. - When both gRPC streaming and chunks transferring is enabled the ingester then
distributorQuerier.streamingSelect()
returnsseries.NewConcreteSeriesSet()
.NewConcreteSeriesSet()
iterates over series stored aschunkSeries
so chunks are iterated bychunkSeries
.chunkSeries.Iterator()
uses the configuredchunkIteratorFunc
, so the behaviour depends on the configured chunk iteration function. The iteration function is returned bygetChunksIteratorFunction()
. Let's assume batch iteration is used (we do), thenbatch.NewChunkMergeIterator()
is used. The deduplication is done bymergeStreams()
which, in case of same timestamp, it just picks one of the two values without checking the actual value
So in all cases, I believe samples deduplication is equal to your mergeExemplarSets()
function 👍
Given the time it took this analysis, what do you think adding a test case to mergeExemplarSets()
to assert on the case "same timestamp but different value" so we write it into the stone tests? :)
return nil, err | ||
} | ||
|
||
i.metrics.queries.Inc() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would work on it in a follow up PR. I generally suggest to work iteratively. It's fine merging this PR as is and discuss the metrics separation in a follow up PR.
pkg/ingester/metrics.go
Outdated
queriedExemplars: promauto.With(r).NewHistogram(prometheus.HistogramOpts{ | ||
Name: "cortex_ingester_queried_exemplars", | ||
Help: "The total number of exemplars returned from queries.", | ||
// TODO: think about buckets, guessing here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 I would remove the TODO.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks!
I found some issues while deploying/testing this that I will fix later this week. |
Dismissing because Callum mentioned they found some issues while testing. Waiting to learn more about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@@ -53,6 +54,32 @@ func (d *Distributor) Query(ctx context.Context, from, to model.Time, matchers . | |||
return matrix, err | |||
} | |||
|
|||
func (d *Distributor) QueryExemplars(ctx context.Context, from, to model.Time, matchers ...[]*labels.Matcher) (*ingester_client.QueryResponse, error) { | |||
var result *ingester_client.QueryResponse | |||
err := instrument.CollectedRequest(ctx, "Distributor.QueryExemplars", d.queryDuration, instrument.ErrorCode, func(ctx context.Context) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a followup to this it may be worth updating the help text for queryDuration. Currently it's:
Time spent executing expression queries.
It may be worth changing it to
Time spent executing expression and exemplar queries.
} | ||
// Merge in any missing values from another ingesters exemplars for this series. | ||
e.Exemplars = mergeExemplarSets(e.Exemplars, ts.Exemplars) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: I may be missing something, but if I'm not mistaken, there is no need to call this function if there isn't an existing result, correct?
} | |
// Merge in any missing values from another ingesters exemplars for this series. | |
e.Exemplars = mergeExemplarSets(e.Exemplars, ts.Exemplars) | |
} else { | |
// Merge in any missing values from another ingesters exemplars for this series. | |
e.Exemplars = mergeExemplarSets(e.Exemplars, ts.Exemplars) | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah probably not, I don't think it's functionally any different. I guess there's the potential for slightly more memory being allocated than is really necessary without the else. Is this what you were thinking?
@@ -106,3 +109,52 @@ func TestMergeSamplesIntoFirstNilB(t *testing.T) { | |||
|
|||
require.Equal(t, b, a) | |||
} | |||
|
|||
func TestMergeExemplarSets(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clean test! 👍
Signed-off-by: Callum Styan <callumstyan@gmail.com>
Signed-off-by: Callum Styan <callumstyan@gmail.com>
pkg/querier/distributor_queryable.go
Outdated
// TODO: (callum) track down why we see empty exemplars slices here | ||
// but not in the distributor/ingester code. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you elaborate on this? This looks something worth to investigate before merging this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I first deployed this branch to one of our Cortex clusters I noticed intermittent query failures in Grafana itself, and looking at the query response I saw a lot of results (they always seemed to be at the beginning of the overall []exemplar.QueryResult
slice) that had empty series labels and 0 len exemplar slices. The if
condition right below was my quick fix.
Yesterday I deployed these changes with additional debug logging, and this block was the only place I saw empty results. Which doesn't really make sense to me. This TODO is a note to me to look into this more. If there's a way to mock a query result from an Ingester replica set then I could try and reproduce in a test, didn't see anything obvious.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed the issue in the latest commit(s).
Signed-off-by: Callum Styan <callumstyan@gmail.com>
to cortex. Signed-off-by: Callum Styan <callumstyan@gmail.com>
Let me know if I should update all the example docker compose setups the same as the one for local s3 as I've done in the latest commit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks for addressing all feedback 🙏
Let me know if I should update all the example docker compose setups the same as the one for local s3 as I've done in the latest commit.
Not strictly required (still a nice to have). Definitely not a blocker for this PR. We can add it anytime, whenever required.
Still needs a few more tests, but the implementation should be good for a first review.
Signed-off-by: Callum Styan callumstyan@gmail.com