-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add bucket tool retention command #4406
Conversation
Signed-off-by: ben.ye <ben.ye@bytedance.com>
Signed-off-by: ben.ye <ben.ye@bytedance.com>
Signed-off-by: ben.ye <ben.ye@bytedance.com>
Can you double-check @yeya24? I think we apply retention asynchronously now. Are you sure it's only at the end? It would amazing if you compactor algorithm actually do it properly, so not extra tool is needed WDYT? (: Also do you know about the work we do with Andre? We are working on pub-sub compactor split, worth to sync on that! |
@bwplotka I think only the cleanup part is asynchronous and it only deletes blocks with deletion markers.
I think it is mentioned in the community OH doc, but what's the conclusion and action items for that? I understand that the usecase for this tool might be rare. Anyway, I used this tool to clean up 50k blocks for us already. If we are going to improve more on the compactor, I am happy to put the tool to https://github.com/yeya24/thanos-tools instead. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have any real context in compactor, but this PR looks good 👍
It adds a tool that clearly solves a real-world problem you are having, and I can see this being helpful for others too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's go for it then, no harm - logic looks correct. Thanks @yeya24 !
Signed-off-by: ben.ye ben.ye@bytedance.com
Changes
This pr adds a new bucket tool
retention
. It simply applies the given retention policy on the bucket, same as compactor.Why this command?
Because of the misconfiguration of our compactor, we have over 60k blocks left in the bucket and I am trying to scale the compactor to make blocks compactions faster.
The workflow of compactor is:
For us, because of the large amount of blocks left in the bucket, step 2 is super slow. Also it is trying to compact blocks that are outside our retention period. For example, it is compacting blocks generated in March, however, our longest retention is only 2 months. So even the compaction succeeds, the new block will be deleted by the retention policy later, which makes the compaction useless.
This command gives us the chance to apply the retention policy first and reduce the unnecessary work. I think this kind of case is rare, but it is good to have it added to
bucket tools
and users can use it if needed.Verification