-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Query: Allow passing a storeMatcher[]
to select matching stores when debugging the querier
#2931
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice. Yes something like this would work! ❤️
I have some ideas, I think we can make it more flexible out of the box! Let me know if that makes sense, should be quick for you to adapt this new idea with matchers 💪
pkg/query/api/v1.go
Outdated
@@ -269,6 +270,21 @@ func (api *API) parseReplicaLabelsParam(r *http.Request) (replicaLabels []string | |||
return replicaLabels, nil | |||
} | |||
|
|||
func (api *API) parseStoreAllowListParam(r *http.Request) (storeAllowList []string, _ *ApiError) { | |||
const storeAllowListParam = "storeAllowList[]" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now when I think of it I wonder if something like matchers would be quite neat to have.
Because when we add allowlist we will soon have request "can I have denylist"? ;p
I mean something like this API have: https://prometheus.io/docs/prometheus/latest/querying/api/#finding-series-by-label-matchers
For example:
&storeMatch[]={__address__=~"1\.2\.3\.\d*"}
This will give us quite amazing flexibility as in future we can add more "labels" to maybe filter by source, health, or any other metadata ... what do you think? (:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it could rely on the same system as the external labels match:
- When a store is registered we add a label for
__thanos_store_address__
containing the address (and maybe a__thanos_store_health__
as well) - Then the user add the label in its regular query and we match it using the regular system
- Before we dispatch the query to the other store, we drop all the
__thanos_store_.*
labels
Pro:
- easy to implement
- no need to use contexts
- easy to use
Con:
- magic is done is the background
- we reserve some labels
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea I was thinking about this as well, but this sounds scary. I think this one will need more thoughts. Let me propagate this question among maintainers and devs. I think the matcher idea is less intrusive.
We could in theory add those labels only in some --debug
mode 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We definitely cannot inject them as normal flow as it will surprise Querier clients. It's important to remember that users already build alerts etc so data consistency is the key
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wonder what e.g @brian-brazil would say about our ideas =D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see no problems here with storeMatch - though I wonder if you really need regexes if this is just for ad-hoc debugging.
Tying this to general label semantics sounds like it's over-complicating things, and would likely cause problems down time line (plus is technically breaking as __thanos_store_address__
is a valid label value that could come from Prometheus).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, let's go for storeMatch
then! 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Let's limit the scope of this change. Although I really like the fact that you could aggregate, join etc all metadata. ;p
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated the PR. Much cleaner to use matcher instead of a regular allow list!
pkg/store/proxy.go
Outdated
@@ -255,8 +260,14 @@ func (s *ProxyStore) Series(r *storepb.SeriesRequest, srv storepb.Store_SeriesSe | |||
// NOTE: all matchers are validated in matchesExternalLabels method so we explicitly ignore error. | |||
var ok bool | |||
tracing.DoInSpan(gctx, "store_matches", func(ctx context.Context) { | |||
storeAllowList := make(map[string]struct{}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's add todo to consider this in proto
d100144
to
4eb9993
Compare
storeMatcher[]
to select matching stores when debugging the querier
@bwplotka could you review it again? :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, I think this will do the work. I would be happy to merge this. There are bunch of things we can improve in next PRs, but nothing critical. Thanks!
pkg/query/querier.go
Outdated
@@ -250,9 +256,20 @@ func (q *querier) selectFn(ctx context.Context, hints *storage.SelectHints, ms . | |||
if err != nil { | |||
return nil, errors.Wrap(err, "convert matchers") | |||
} | |||
var storeMatchers [][]storepb.LabelMatcher | |||
for _, stms := range q.storeMatchers { | |||
stm, err := storepb.TranslatePromMatchers(stms...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm what about translating them in build time (start time of query) not query time? (:
pkg/store/proxy.go
Outdated
if len(storeMatcher) != 0 { | ||
res := false | ||
for _, stm := range storeMatcher { | ||
stmMatch, err := labelSetMatches(clientLabels, stm) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can simplify this error handling, not sure why simple matching can produce error tbh ;p
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it better? The function is very similar to labelSetsMatch()
} | ||
|
||
func generateMetadataClientLabels(s Client) storepb.LabelSet { | ||
l := storepb.Label{Name: "__address__", Value: s.Addr()} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like shallow function to me ):
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's in a separate function because I wanted to separate the matching from the generation of labels. It also make the function easier to extend with new labels or be used elsewhere.
Rebase needed though ): |
docs/components/query.md
Outdated
@@ -222,6 +222,24 @@ Thanos Querier has the ability to perform concurrent select request per query. I | |||
The maximum number of concurrent requests are being made per query is controller by `query.max-concurrent-select` flag. | |||
Keep in mind that the maximum number of concurrent queries that are handled by querier is controlled by `query.max-concurrent`. Please consider implications of combined value while tuning the querier. | |||
|
|||
### Store filtering | |||
|
|||
It's possible to provide a set of matchers to the Querier api to select specific stores to be used during the query using the `storeMatch[]` parameter. It is useful when debugging a slow/broken store. It uses the same format as the matcher of [Prometheus' federate api](https://prometheus.io/docs/prometheus/latest/querying/api/#finding-series-by-label-matchers). Note that at the moment the querier only support the `__address__` which contains the address of the store. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's possible to provide a set of matchers to the Querier api to select specific stores to be used during the query using the `storeMatch[]` parameter. It is useful when debugging a slow/broken store. It uses the same format as the matcher of [Prometheus' federate api](https://prometheus.io/docs/prometheus/latest/querying/api/#finding-series-by-label-matchers). Note that at the moment the querier only support the `__address__` which contains the address of the store. | |
It's possible to provide a set of matchers to the Querier api to select specific stores to be used during the query using the `storeMatch[]` parameter. It is useful when debugging a slow/broken store. It uses the same format as the matcher of [Prometheus' federate api](https://prometheus.io/docs/prometheus/latest/querying/api/#finding-series-by-label-matchers). Note that at the moment the querier only support the `__address__` which contains the address of the store that were ... |
I think it would be actually IP address that is visible as Endpoint on Store API. Not necessarily domain as user can configure dns://prometheus-foo.thanos-sidecar:10901
(:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it better like this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes! thx!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we link the docs here in changelog & rebase? Then it's more than enough for merge 💪 Thanks!
aaa03d4
to
a651d8b
Compare
Rebased! |
Add a new parameter on the API to pass an allow list of stores. Only stores matching this list will be able to be used for the query. Signed-off-by: Geoffrey Beausire <g.beausire@criteo.com>
Thanks! |
Nice, thanks for the reviews! |
Changes
Solves the api part of #2919
Add a new parameter on the API to pass an allow list of stores.
Only stores matching this list will be able to be used for the query.
Verification
WIP: manual testing and partial unit tests