Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add bucket tool retention command #4406

Merged
merged 3 commits into from
Jul 6, 2021
Merged

Conversation

yeya24
Copy link
Contributor

@yeya24 yeya24 commented Jul 3, 2021

Signed-off-by: ben.ye ben.ye@bytedance.com

  • I added CHANGELOG entry for this change.
  • Change is not relevant to the end user.

Changes

This pr adds a new bucket tool retention. It simply applies the given retention policy on the bucket, same as compactor.

Why this command?

Because of the misconfiguration of our compactor, we have over 60k blocks left in the bucket and I am trying to scale the compactor to make blocks compactions faster.

The workflow of compactor is:

  1. Grouper groups blocks into several groups
  2. Apply compaction
  3. Perform downsampling if enabled
  4. Apply retention policy

For us, because of the large amount of blocks left in the bucket, step 2 is super slow. Also it is trying to compact blocks that are outside our retention period. For example, it is compacting blocks generated in March, however, our longest retention is only 2 months. So even the compaction succeeds, the new block will be deleted by the retention policy later, which makes the compaction useless.

This command gives us the chance to apply the retention policy first and reduce the unnecessary work. I think this kind of case is rare, but it is good to have it added to bucket tools and users can use it if needed.

Verification

ben.ye added 3 commits July 5, 2021 19:33
Signed-off-by: ben.ye <ben.ye@bytedance.com>
Signed-off-by: ben.ye <ben.ye@bytedance.com>
Signed-off-by: ben.ye <ben.ye@bytedance.com>
@bwplotka
Copy link
Member

bwplotka commented Jul 6, 2021

Can you double-check @yeya24? I think we apply retention asynchronously now. Are you sure it's only at the end?

It would amazing if you compactor algorithm actually do it properly, so not extra tool is needed WDYT? (:

Also do you know about the work we do with Andre? We are working on pub-sub compactor split, worth to sync on that!

@yeya24
Copy link
Contributor Author

yeya24 commented Jul 6, 2021

@bwplotka I think only the cleanup part is asynchronous and it only deletes blocks with deletion markers.
The apply retention step checks blocks and adds deletion markers for them so they are not asynchronous. Please see https://github.com/thanos-io/thanos/blob/main/cmd/thanos/compact.go#L451. I think that's the only place to apply retention right now.

Also do you know about the work we do with Andre? We are working on pub-sub compactor split, worth to sync on that!

I think it is mentioned in the community OH doc, but what's the conclusion and action items for that?

I understand that the usecase for this tool might be rare. Anyway, I used this tool to clean up 50k blocks for us already. If we are going to improve more on the compactor, I am happy to put the tool to https://github.com/yeya24/thanos-tools instead.

Copy link
Contributor

@bill3tt bill3tt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have any real context in compactor, but this PR looks good 👍

It adds a tool that clearly solves a real-world problem you are having, and I can see this being helpful for others too.

Copy link
Member

@bwplotka bwplotka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's go for it then, no harm - logic looks correct. Thanks @yeya24 !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants