Skip to content

exemplar querying #4181

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Jun 15, 2021
Merged

exemplar querying #4181

merged 8 commits into from
Jun 15, 2021

Conversation

cstyan
Copy link
Contributor

@cstyan cstyan commented May 13, 2021

Still needs a few more tests, but the implementation should be good for a first review.

Signed-off-by: Callum Styan callumstyan@gmail.com

Signed-off-by: Callum Styan <callumstyan@gmail.com>
result = append(result, b[j])
j++
} else {
result = append(result, a[i])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure, but should this compare Value as well? The Prometheus dedupe logic does so I think different exemplars with the same Ts but different Values are possible. https://github.com/prometheus/prometheus/blob/main/pkg/exemplar/exemplar.go#L49

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that any data that comes into cortex is replicated to the right ingester replica set, and the write is successful when we're replicated to a quorum (2 out of 3 in our case).

So my assumption is that while not every ingester in the replica set would have every exemplar, there shouldn't be any ingester that has an exemplar at a certain timestamp that doesn't exist in another ingester. That is, because we're not scraping and (potentially) setting the timestamps ourselves within cortex, merging by timestamp should be enough.

Anything I'm missing here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So my assumption is that while not every ingester in the replica set would have every exemplar, there shouldn't be any ingester that has an exemplar at a certain timestamp that doesn't exist in another ingester.

Think about this edge case:

  1. Remote write exemplar for series X with TS=1 and V=1. Replication set is ingesters 1,2,3. This write succeed towards ingester 1 and 2, but not 3 (not ingested here).
  2. Remote write exemplar for series X with TS=1 and V=2. Replication set is ingesters 1,2,3. This write succeed toward ingester 3 (was not ingested before) but fails on 1 and 2 (because of same timestamp but different value). Even if quorum has not been reached, the exemplar with V=2 is written anyway to ingester 3 (will not rollback from ingester 3 because writing to 1 and 2 has failed).

When you read back, you will have two exemplars with TS=1 and values 1 and 2.

The current implementation picks a random exemplar (between the two) if the timestamp is the same, without checking if value is equal as well. I personally think it's fine, but I wanted to outline the edge case above so you can take an informed decision.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does Cortex handle this situation for samples? I think exemplars are handled the same way here since this is essentially a copy of util.MergeSampleSets.

The only way this edge case could really happen that I know of is bad relabel configs client side, resulting in series from different Prometheus instances to result in the same series within Cortex. In Prometheus' exemplar storage we rejects duplicate exemplars (description of what is considered a duplicate), and do the same in Cortex since we're just using Prometheus' exemplar storage currently.

Does this make sense, or am I missing something? cc @mdisibio

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think exemplars are handled the same way here since this is essentially a copy of util.MergeSampleSets

Thinking loudly:

  • util.MergeSampleSets is just used when gRPC streaming between queriers and ingesters is disabled (it's enabled by default).
  • When gRPC streaming is enabled and chunks transferring is disabled, we use mergeSamples() which doesn't compare samples value.
  • When both gRPC streaming and chunks transferring is enabled the ingester then distributorQuerier.streamingSelect() returns series.NewConcreteSeriesSet(). NewConcreteSeriesSet() iterates over series stored as chunkSeries so chunks are iterated by chunkSeries. chunkSeries.Iterator() uses the configured chunkIteratorFunc, so the behaviour depends on the configured chunk iteration function. The iteration function is returned by getChunksIteratorFunction(). Let's assume batch iteration is used (we do), then batch.NewChunkMergeIterator() is used. The deduplication is done by mergeStreams() which, in case of same timestamp, it just picks one of the two values without checking the actual value

So in all cases, I believe samples deduplication is equal to your mergeExemplarSets() function 👍

Given the time it took this analysis, what do you think adding a test case to mergeExemplarSets() to assert on the case "same timestamp but different value" so we write it into the stone tests? :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again for the detailed explanation.

return err
}

replicationSet, err := d.GetIngestersForQuery(ctx, false, nil)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be good to clarify how all matchers are being sent to all ingesters for the query, even though on ingest they are sharded by series (i.e. it is expected that an ingester will satisfy only a subset of the matchers and the results will be merged). Maybe some comments, or a different name for the exemplarQuery parameter would be enough.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree a comment would be great.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I will add a comment.

For my own understanding, is this only true for shuffle sharding? even though on ingest they are sharded by series. I haven't followed shuffle sharding much. My assumption has always been that any data is replicated to all the ingesters for that tenant.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also now thinking of whether we should just split this up more, getting the replica set for each set of matchers and running the exemplar queries with those, and then merging the results.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also now thinking of whether we should just split this up more, getting the replica set for each set of matchers and running the exemplar queries with those, and then merging the results.

I don't think it's worth the effort. Let me explain why.

Regardless of shuffle-sharding (which is about tenants sharding, not series), series can be sharded across ingesters into two ways:

  1. By metric name only
  2. By all series

When sharding by all series you have to query all ingesters, while when sharding by metric name all series for a given metric name are sharded always to the same ingester (+ replicas). That's why we can restrict the ingesters to query from when "shard by metric name only" is used.

However, the current implementation of "query only a restricted set of ingesters when shard by metric name only is enabled" is also buggy and doesn't take in account reshardings (eg. scale up), so it's not even safe to use out of the box with the blocks storage.

Long story short, I think it's fine (and desired) to always query all ingesters for the exemplars. When shuffle-sharding is enabled, it will only query ingesters belonging to the tenant's shard (and that's OK).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pracucci Thanks for the explanation! So in either scenario there's never 3 ingesters that have all of a given tenants data? A set of three would have a subset of their data based on the sharding type?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand your question. Could you elaborate more your question, please?

Copy link
Contributor

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solid job @cstyan! Didn't find any issue. I left few minor comments I would be glad if you could take a look. Waiting for tests!

Could you also document the new API endpoints in docs/api/_index.md, please?

@@ -254,6 +256,8 @@ func NewQuerierHandler(
router.Path(path.Join(legacyPrefix, "/api/v1/read")).Methods("POST").Handler(legacyPromRouter)
router.Path(path.Join(legacyPrefix, "/api/v1/query")).Methods("GET", "POST").Handler(legacyPromRouter)
router.Path(path.Join(legacyPrefix, "/api/v1/query_range")).Methods("GET", "POST").Handler(legacyPromRouter)
// unclear to me whether we need to register here
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep it and then we will address this while router.Path() group in a separate PR, considering prometheus/prometheus#7125 has been merged.

we will address this while router.Path() group in a separate PR

Could you open an issue about it, please?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

created: #4213

return err
}

replicationSet, err := d.GetIngestersForQuery(ctx, false, nil)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree a comment would be great.

@@ -53,6 +54,32 @@ func (d *Distributor) Query(ctx context.Context, from, to model.Time, matchers .
return matrix, err
}

func (d *Distributor) QueryExemplars(ctx context.Context, from, to model.Time, matchers ...[]*labels.Matcher) (*ingester_client.QueryResponse, error) {
var result *ingester_client.QueryResponse
err := instrument.CollectedRequest(ctx, "Distributor.QueryExemplars", d.queryDuration, instrument.ErrorCode, func(ctx context.Context) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're reusing d.queryDuration duration here. I think it's fine, but let's see how the other discussion about metrics evolve.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any thoughts here? we haven't discussed the rest of the metrics recently

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally think it's fine starting tracking exemplar queries metrics along with "samples query" metrics to begin with (we're doing the same in ingester at query time). We may reconsider it as a future improvement, but I wouldn't block on this (I'm pretty sure we'll refine the exemplars implementation while learning more running it in production, as it happens for every feature we build).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a followup to this it may be worth updating the help text for queryDuration. Currently it's:

Time spent executing expression queries.

It may be worth changing it to

Time spent executing expression and exemplar queries.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

Prometheus-compatible exemplar query endpoint. When the request is sent through the query-frontend, the query will be accelerated by query-frontend (results caching and execution parallelisation).

_For more information, please check out the Prometheus [range query](https://prometheus.io/docs/prometheus/latest/querying/api/#range-queries) documentation._
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's link to the right one.

Suggested change
_For more information, please check out the Prometheus [range query](https://prometheus.io/docs/prometheus/latest/querying/api/#range-queries) documentation._
_For more information, please check out the Prometheus [querying exemplars](https://prometheus.io/docs/prometheus/latest/querying/api/#querying-exemplars) documentation._

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤦 fixed

GET,POST <legacy-http-prefix>/api/v1/query_exemplars
```

Prometheus-compatible exemplar query endpoint. When the request is sent through the query-frontend, the query will be accelerated by query-frontend (results caching and execution parallelisation).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The query-frontend doesn't accelerate exemplars endpoint.

Suggested change
Prometheus-compatible exemplar query endpoint. When the request is sent through the query-frontend, the query will be accelerated by query-frontend (results caching and execution parallelisation).
Prometheus-compatible exemplar query endpoint.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't we cache results?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The /api/v1/query_exemplars is not supported by query-frontend (at least not yet) so it's not getting accelerated. It's definitely something we can do, but it hasn't been done yet. Am I missing anything?

return nil, err
}

i.metrics.queries.Inc()
Copy link
Contributor

@pracucci pracucci May 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if Cortex has a 'metrics label changes are breaking changes' policy.

The breaking changes policy doesn't cover metrics so no policy blocker here. However, I would keep it simple in this PR and not add an extra label. It's something we can reconsider in the future IMO.

Reason why I would keep it simple is because if we add a label here we should probably do the same in other places (eg. query duration tracked by Distributor.QueryExemplars()).

Copy link
Contributor

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @cstyan ! The PR LGTM! I left few last minor comments. Could you also add at least a unit test to cover the querying part?

result = append(result, b[j])
j++
} else {
result = append(result, a[i])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So my assumption is that while not every ingester in the replica set would have every exemplar, there shouldn't be any ingester that has an exemplar at a certain timestamp that doesn't exist in another ingester.

Think about this edge case:

  1. Remote write exemplar for series X with TS=1 and V=1. Replication set is ingesters 1,2,3. This write succeed towards ingester 1 and 2, but not 3 (not ingested here).
  2. Remote write exemplar for series X with TS=1 and V=2. Replication set is ingesters 1,2,3. This write succeed toward ingester 3 (was not ingested before) but fails on 1 and 2 (because of same timestamp but different value). Even if quorum has not been reached, the exemplar with V=2 is written anyway to ingester 3 (will not rollback from ingester 3 because writing to 1 and 2 has failed).

When you read back, you will have two exemplars with TS=1 and values 1 and 2.

The current implementation picks a random exemplar (between the two) if the timestamp is the same, without checking if value is equal as well. I personally think it's fine, but I wanted to outline the edge case above so you can take an informed decision.

return &client.ExemplarQueryResponse{}, nil
}

// Note that currently Prometheus' exemplar querier does nothing with a context that you pass it.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit] I would remove this comment just because these kind of comments age very badly :) If ctx will be used at some point by Prometheus, we'll very likely forget to remove this comment, but this comment can still deceive the reader/contributor.

Comment on lines 1081 to 1082
// TODO should we update this series metric again?
// i.metrics.queriedSeries.Observe(float64(len(result.Timeseries)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking more about this, I believe we shouldn't. Because of this, I would remove this commented code.

queriedExemplars: promauto.With(r).NewHistogram(prometheus.HistogramOpts{
Name: "cortex_ingester_queried_exemplars",
Help: "The total number of exemplars returned from queries.",
// TODO: think about buckets, guessing here.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To my understanding, max bucket is 1*(5^(5-1)) = 625. Maybe a bit low?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've bumped the start to 10, this would give us buckets of 10, 50, 250, 1250, 6250. Let me know if you think this is reasonable. The likely hood of the exemplar in memory storage having very few exemplars per series is high. We could bump the # of buckets to 8 like the samples buckets, that would give us 10, 50, 250, 1250, 6250, 31250, 156250, 781250.

FWIW, out of the box, when you enable Prometheus exemplar storage, it has a max circular buffer size of 100k exemplars.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 I would remove the TODO.

Signed-off-by: Callum Styan <callumstyan@gmail.com>
Copy link
Contributor Author

@cstyan cstyan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay, on-call has been noisy this week.

Updated the ingester push test (which also calls the query function for samples) to call the query exemplars function and confirm exemplars that are stored properly (we only had 1 test case for storing exemplars in that test), hadn't pushed that yet so here it is.

Addressed a few other small comments + replied to some, mostly metrics related.

GET,POST <legacy-http-prefix>/api/v1/query_exemplars
```

Prometheus-compatible exemplar query endpoint. When the request is sent through the query-frontend, the query will be accelerated by query-frontend (results caching and execution parallelisation).
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't we cache results?


Prometheus-compatible exemplar query endpoint. When the request is sent through the query-frontend, the query will be accelerated by query-frontend (results caching and execution parallelisation).

_For more information, please check out the Prometheus [range query](https://prometheus.io/docs/prometheus/latest/querying/api/#range-queries) documentation._
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤦 fixed

@@ -53,6 +54,32 @@ func (d *Distributor) Query(ctx context.Context, from, to model.Time, matchers .
return matrix, err
}

func (d *Distributor) QueryExemplars(ctx context.Context, from, to model.Time, matchers ...[]*labels.Matcher) (*ingester_client.QueryResponse, error) {
var result *ingester_client.QueryResponse
err := instrument.CollectedRequest(ctx, "Distributor.QueryExemplars", d.queryDuration, instrument.ErrorCode, func(ctx context.Context) error {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any thoughts here? we haven't discussed the rest of the metrics recently

result = append(result, b[j])
j++
} else {
result = append(result, a[i])
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does Cortex handle this situation for samples? I think exemplars are handled the same way here since this is essentially a copy of util.MergeSampleSets.

The only way this edge case could really happen that I know of is bad relabel configs client side, resulting in series from different Prometheus instances to result in the same series within Cortex. In Prometheus' exemplar storage we rejects duplicate exemplars (description of what is considered a duplicate), and do the same in Cortex since we're just using Prometheus' exemplar storage currently.

Does this make sense, or am I missing something? cc @mdisibio

return nil, err
}

i.metrics.queries.Inc()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would you prefer separate metrics altogether here?

queriedExemplars: promauto.With(r).NewHistogram(prometheus.HistogramOpts{
Name: "cortex_ingester_queried_exemplars",
Help: "The total number of exemplars returned from queries.",
// TODO: think about buckets, guessing here.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've bumped the start to 10, this would give us buckets of 10, 50, 250, 1250, 6250. Let me know if you think this is reasonable. The likely hood of the exemplar in memory storage having very few exemplars per series is high. We could bump the # of buckets to 8 like the samples buckets, that would give us 10, 50, 250, 1250, 6250, 31250, 156250, 781250.

FWIW, out of the box, when you enable Prometheus exemplar storage, it has a max circular buffer size of 100k exemplars.

Copy link
Contributor

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks! I left few final nits, all very small things. Please also take a look at failed lint and unit tests.

Could you update this CHANGELOG entry mentioning we support querying too + adding the PR number along side "#4124", please?

GET,POST <legacy-http-prefix>/api/v1/query_exemplars
```

Prometheus-compatible exemplar query endpoint. When the request is sent through the query-frontend, the query will be accelerated by query-frontend (results caching and execution parallelisation).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The /api/v1/query_exemplars is not supported by query-frontend (at least not yet) so it's not getting accelerated. It's definitely something we can do, but it hasn't been done yet. Am I missing anything?

@@ -53,6 +54,32 @@ func (d *Distributor) Query(ctx context.Context, from, to model.Time, matchers .
return matrix, err
}

func (d *Distributor) QueryExemplars(ctx context.Context, from, to model.Time, matchers ...[]*labels.Matcher) (*ingester_client.QueryResponse, error) {
var result *ingester_client.QueryResponse
err := instrument.CollectedRequest(ctx, "Distributor.QueryExemplars", d.queryDuration, instrument.ErrorCode, func(ctx context.Context) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally think it's fine starting tracking exemplar queries metrics along with "samples query" metrics to begin with (we're doing the same in ingester at query time). We may reconsider it as a future improvement, but I wouldn't block on this (I'm pretty sure we'll refine the exemplars implementation while learning more running it in production, as it happens for every feature we build).

result = append(result, b[j])
j++
} else {
result = append(result, a[i])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think exemplars are handled the same way here since this is essentially a copy of util.MergeSampleSets

Thinking loudly:

  • util.MergeSampleSets is just used when gRPC streaming between queriers and ingesters is disabled (it's enabled by default).
  • When gRPC streaming is enabled and chunks transferring is disabled, we use mergeSamples() which doesn't compare samples value.
  • When both gRPC streaming and chunks transferring is enabled the ingester then distributorQuerier.streamingSelect() returns series.NewConcreteSeriesSet(). NewConcreteSeriesSet() iterates over series stored as chunkSeries so chunks are iterated by chunkSeries. chunkSeries.Iterator() uses the configured chunkIteratorFunc, so the behaviour depends on the configured chunk iteration function. The iteration function is returned by getChunksIteratorFunction(). Let's assume batch iteration is used (we do), then batch.NewChunkMergeIterator() is used. The deduplication is done by mergeStreams() which, in case of same timestamp, it just picks one of the two values without checking the actual value

So in all cases, I believe samples deduplication is equal to your mergeExemplarSets() function 👍

Given the time it took this analysis, what do you think adding a test case to mergeExemplarSets() to assert on the case "same timestamp but different value" so we write it into the stone tests? :)

return nil, err
}

i.metrics.queries.Inc()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would work on it in a follow up PR. I generally suggest to work iteratively. It's fine merging this PR as is and discuss the metrics separation in a follow up PR.

queriedExemplars: promauto.With(r).NewHistogram(prometheus.HistogramOpts{
Name: "cortex_ingester_queried_exemplars",
Help: "The total number of exemplars returned from queries.",
// TODO: think about buckets, guessing here.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 I would remove the TODO.

Signed-off-by: Callum Styan <callumstyan@gmail.com>
@cstyan cstyan changed the title WIP: exemplar querying exemplar querying May 24, 2021
pracucci
pracucci previously approved these changes May 25, 2021
Copy link
Contributor

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks!

@cstyan
Copy link
Contributor Author

cstyan commented May 27, 2021

I found some issues while deploying/testing this that I will fix later this week.

@pracucci pracucci dismissed their stale review May 27, 2021 09:00

Dismissing because Callum mentioned they found some issues while testing. Waiting to learn more about it.

Copy link
Contributor

@jtlisi jtlisi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -53,6 +54,32 @@ func (d *Distributor) Query(ctx context.Context, from, to model.Time, matchers .
return matrix, err
}

func (d *Distributor) QueryExemplars(ctx context.Context, from, to model.Time, matchers ...[]*labels.Matcher) (*ingester_client.QueryResponse, error) {
var result *ingester_client.QueryResponse
err := instrument.CollectedRequest(ctx, "Distributor.QueryExemplars", d.queryDuration, instrument.ErrorCode, func(ctx context.Context) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a followup to this it may be worth updating the help text for queryDuration. Currently it's:

Time spent executing expression queries.

It may be worth changing it to

Time spent executing expression and exemplar queries.

Comment on lines +274 to +276
}
// Merge in any missing values from another ingesters exemplars for this series.
e.Exemplars = mergeExemplarSets(e.Exemplars, ts.Exemplars)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: I may be missing something, but if I'm not mistaken, there is no need to call this function if there isn't an existing result, correct?

Suggested change
}
// Merge in any missing values from another ingesters exemplars for this series.
e.Exemplars = mergeExemplarSets(e.Exemplars, ts.Exemplars)
} else {
// Merge in any missing values from another ingesters exemplars for this series.
e.Exemplars = mergeExemplarSets(e.Exemplars, ts.Exemplars)
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah probably not, I don't think it's functionally any different. I guess there's the potential for slightly more memory being allocated than is really necessary without the else. Is this what you were thinking?

@@ -106,3 +109,52 @@ func TestMergeSamplesIntoFirstNilB(t *testing.T) {

require.Equal(t, b, a)
}

func TestMergeExemplarSets(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean test! 👍

cstyan added 2 commits May 31, 2021 20:41
Signed-off-by: Callum Styan <callumstyan@gmail.com>
Signed-off-by: Callum Styan <callumstyan@gmail.com>
Comment on lines 237 to 238
// TODO: (callum) track down why we see empty exemplars slices here
// but not in the distributor/ingester code.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate on this? This looks something worth to investigate before merging this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I first deployed this branch to one of our Cortex clusters I noticed intermittent query failures in Grafana itself, and looking at the query response I saw a lot of results (they always seemed to be at the beginning of the overall []exemplar.QueryResult slice) that had empty series labels and 0 len exemplar slices. The if condition right below was my quick fix.

Yesterday I deployed these changes with additional debug logging, and this block was the only place I saw empty results. Which doesn't really make sense to me. This TODO is a note to me to look into this more. If there's a way to mock a query result from an Ingester replica set then I could try and reproduce in a test, didn't see anything obvious.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed the issue in the latest commit(s).

cstyan added 2 commits June 8, 2021 17:39
Signed-off-by: Callum Styan <callumstyan@gmail.com>
to cortex.

Signed-off-by: Callum Styan <callumstyan@gmail.com>
@cstyan
Copy link
Contributor Author

cstyan commented Jun 9, 2021

Let me know if I should update all the example docker compose setups the same as the one for local s3 as I've done in the latest commit.

@cstyan cstyan requested a review from pracucci June 15, 2021 15:28
Copy link
Contributor

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for addressing all feedback 🙏

Let me know if I should update all the example docker compose setups the same as the one for local s3 as I've done in the latest commit.

Not strictly required (still a nice to have). Definitely not a blocker for this PR. We can add it anytime, whenever required.

@pracucci pracucci merged commit 82b32ec into cortexproject:master Jun 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants