Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alternative implementation: Push replica labels at the end after Recv #5742

Closed
wants to merge 3 commits into from

Conversation

fpetkovski
Copy link
Contributor

@fpetkovski fpetkovski commented Sep 29, 2022

This commit removes the case for a global sort on all series in the querier, and redistributes in places where series are received from stores.

This is an alternative implementation to #5692.

I ran a query with this implementation against 100K series and the execution time dropped from 26s to 19s compared to main. With the new engine, the query took 9s.

Signed-off-by: Filip Petkovski filip.petkovsky@gmail.com

  • I added CHANGELOG entry for this change.
  • Change is not relevant to the end user.

Changes

Verification

@fpetkovski fpetkovski changed the title Push replica labels at the end after Recv Alternative implementation: Push replica labels at the end after Recv Sep 29, 2022
}

func (m *dedupReadyResponse) LessWithoutReplicaLabels(other *dedupReadyResponse) bool {
if m.GetSeries() != nil && other.GetSeries() != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It crashes for me here. Generated code looks like this:

	if x, ok := m.GetResult().(*SeriesResponse_Series); ok {
		return x.Series
	}

It doesn't check whether x is not nil :'/ Maybe we can add a small helper:

func (m *SeriesResponse) GetSeriesNotNil() *Series {
	if x, ok := m.GetResult().(*SeriesResponse_Series); ok {
		if x == nil {
			return nil
		}
		return x.Series
	}
	return nil
}

The problem might be somewhere else, though, that we are comparing nil with not nil. Maybe related to #5717?

See my comment there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for testing this. I assume you are using the lazy strategy when it crashes?

if labels.Compare(labelpb.ZLabelsToPromLabels(lbls), labelpb.ZLabelsToPromLabels(lastLbls)) == 0 {
// Unless a response comes from query pushdown, it is safe to merge
// chunks from replica series together into one slice.
labelsEqual := labels.Compare(labelpb.ZLabelsToPromLabels(lbls), labelpb.ZLabelsToPromLabels(lastLbls)) == 0
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would love to use labelsEqual = resp.signature == lastResponse.signature here, but unfortunately it breaks tests with overlapping and interleaved chunks from prometheus replicas. This was introduced and solved in:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested that change in a canary environment, and together with the new engine saw a decrease in a query from 75s to 8s, which is just insane. We have a layered querier setup, so deduplicating as soon as possible makes a massive difference.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -342,6 +351,12 @@ func (s *ProxyStore) Series(originalRequest *storepb.SeriesRequest, srv storepb.
}
}

if respHeap.UnsortedSetDetected() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels a bit like implementing InfoServer through the Series service. What do you think about checking st.SendsSortedSeries() instead and if all are true then we could also advertise that SendsSortedSeries() == true in Thanos Query. And if there is at least one where SendsSortedSeries() == false then let's also advertise false in Thanos Query.

I have been playing around with your PR a little bit and ☝️ seems like a much cleaner solution. I can try opening up a PR to it tomorrow or on Wednesday.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would make sense, I'll rework the PR to change the detection mechanism.
However, should we also change stores to have a sorted detection without replica labels? This should be possible now since replica labels are part of the Series request.

Copy link
Contributor Author

@fpetkovski fpetkovski Oct 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that I read that I again, with the current implementation of stores, I am not sure if it is possible to make a detection statically without inspecting the data. A user can specify any label as the replica label in a Query component. So stores need to resort data arbitrarily in the Series call if we want to have checks through info calls.

If we want to remove the sort in the querier, data needs to be sorted without replica labels for k-way merge to work. That means data needs to be resorted in every store that can return multiple replicas of the same series. Since users can specify any label as the replica label, this in practice means each store should be able to resort data.

Taking the following example

1. a=1, r=1, z=1
2. a=1, r=1, z=2
3. a=1, r=2, z=1

if the replica label is r, data needs to be sent in the following order for deduplication to work

1. a=1, z=1, r=1
3. a=1, z=2, r=1
2. a=1, z=1, r=2

So rows 2. and 3. need to be switched.

@bwplotka
Copy link
Member

bwplotka commented Oct 3, 2022

Panics on my side:
image

@bwplotka
Copy link
Member

bwplotka commented Oct 3, 2022

let's see tomorrow (:

@GiedriusS
Copy link
Member

Panics on my side: image

You need to merge the latest main into this branch to fix this 😄

@fpetkovski
Copy link
Contributor Author

Might also be fixed by thanos-io/promql-engine#55

This commit removes the case for a global sort on all series in the
querier, and redistributes in places where series are received from
stores.

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
Signed-off-by: fpetkovski <filip.petkovsky@gmail.com>
Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
@stale
Copy link

stale bot commented Nov 12, 2022

Hello 👋 Looks like there was no activity on this amazing PR for the last 30 days.
Do you mind updating us on the status? Is there anything we can help with? If you plan to still work on it, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity in the next week, this issue will be closed (we can always reopen a PR if you get back to this!). Alternatively, use remind command if you wish to be reminded at some point in future.

@stale stale bot added the stale label Nov 12, 2022
@GiedriusS
Copy link
Member

I think we can close this since an alternative implementation was merged for solving this problem? 😄

@GiedriusS GiedriusS closed this Feb 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants