-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Alternative implementation: Push replica labels at the end after Recv #5742
Conversation
50536a0
to
716e4b9
Compare
} | ||
|
||
func (m *dedupReadyResponse) LessWithoutReplicaLabels(other *dedupReadyResponse) bool { | ||
if m.GetSeries() != nil && other.GetSeries() != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It crashes for me here. Generated code looks like this:
if x, ok := m.GetResult().(*SeriesResponse_Series); ok {
return x.Series
}
It doesn't check whether x
is not nil
:'/ Maybe we can add a small helper:
func (m *SeriesResponse) GetSeriesNotNil() *Series {
if x, ok := m.GetResult().(*SeriesResponse_Series); ok {
if x == nil {
return nil
}
return x.Series
}
return nil
}
The problem might be somewhere else, though, that we are comparing nil
with not nil
. Maybe related to #5717?
See my comment there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for testing this. I assume you are using the lazy strategy when it crashes?
4f12269
to
83647d8
Compare
if labels.Compare(labelpb.ZLabelsToPromLabels(lbls), labelpb.ZLabelsToPromLabels(lastLbls)) == 0 { | ||
// Unless a response comes from query pushdown, it is safe to merge | ||
// chunks from replica series together into one slice. | ||
labelsEqual := labels.Compare(labelpb.ZLabelsToPromLabels(lbls), labelpb.ZLabelsToPromLabels(lastLbls)) == 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would love to use labelsEqual = resp.signature == lastResponse.signature
here, but unfortunately it breaks tests with overlapping and interleaved chunks from prometheus replicas. This was introduced and solved in:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested that change in a canary environment, and together with the new engine saw a decrease in a query from 75s to 8s, which is just insane. We have a layered querier setup, so deduplicating as soon as possible makes a massive difference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -342,6 +351,12 @@ func (s *ProxyStore) Series(originalRequest *storepb.SeriesRequest, srv storepb. | |||
} | |||
} | |||
|
|||
if respHeap.UnsortedSetDetected() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels a bit like implementing InfoServer through the Series service. What do you think about checking st.SendsSortedSeries()
instead and if all are true then we could also advertise that SendsSortedSeries() == true
in Thanos Query. And if there is at least one where SendsSortedSeries() == false
then let's also advertise false
in Thanos Query.
I have been playing around with your PR a little bit and ☝️ seems like a much cleaner solution. I can try opening up a PR to it tomorrow or on Wednesday.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this would make sense, I'll rework the PR to change the detection mechanism.
However, should we also change stores to have a sorted detection without replica labels? This should be possible now since replica labels are part of the Series request.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that I read that I again, with the current implementation of stores, I am not sure if it is possible to make a detection statically without inspecting the data. A user can specify any label as the replica label in a Query component. So stores need to resort data arbitrarily in the Series call if we want to have checks through info calls.
If we want to remove the sort in the querier, data needs to be sorted without replica labels for k-way merge to work. That means data needs to be resorted in every store that can return multiple replicas of the same series. Since users can specify any label as the replica label, this in practice means each store should be able to resort data.
Taking the following example
1. a=1, r=1, z=1
2. a=1, r=1, z=2
3. a=1, r=2, z=1
if the replica label is r
, data needs to be sent in the following order for deduplication to work
1. a=1, z=1, r=1
3. a=1, z=2, r=1
2. a=1, z=1, r=2
So rows 2. and 3. need to be switched.
let's see tomorrow (: |
Might also be fixed by thanos-io/promql-engine#55 |
83647d8
to
b804880
Compare
This commit removes the case for a global sort on all series in the querier, and redistributes in places where series are received from stores. Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
Signed-off-by: fpetkovski <filip.petkovsky@gmail.com>
b804880
to
4b8d644
Compare
Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
Hello 👋 Looks like there was no activity on this amazing PR for the last 30 days. |
I think we can close this since an alternative implementation was merged for solving this problem? 😄 |
This commit removes the case for a global sort on all series in the querier, and redistributes in places where series are received from stores.
This is an alternative implementation to #5692.
I ran a query with this implementation against 100K series and the execution time dropped from 26s to 19s compared to main. With the new engine, the query took 9s.
Signed-off-by: Filip Petkovski filip.petkovsky@gmail.com
Changes
Verification