-
Notifications
You must be signed in to change notification settings - Fork 2.1k
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tips for slow compact on a large bucket with large blocks #4310
Comments
The correct link is now: https://thanos.io/tip/thanos/sharding.md/#compactor There are various features about improving compactor performance. This is an umbrella issue which tracks them: #4233 Your first issue link is/should be resolved with #3115 That said, I'm not sure if you are really hitting certain limits that could not be already resolved by tweaking your setup. So I'm curious about what Thanos version you are using and if you could tell me something about the stats of it (i.e. cpu & memory usage). Did you also limit those stats? If you want to run multiple compactors, you could look into the labels. As per docs: "This allows to assign multiple streams to each instance of compactor." For example for store component one could use a relabel like this (don't use this for compactor!):
Yet this should not be used for compactor as this is not 'pinned' towards a specific stream. We merely split all data over multiple shards. So, you want some form of relabel config that regex on streams; i.e.;
https://prometheus.io/docs/prometheus/latest/configuration/configuration/#relabel_config |
I am considering the same thing and I think you already gave the answer @sevagh 😄.
|
Thanos for the replies. @yeya24 I'm reading this PR you recently got merged, and I think it might help me: https://github.com/thanos-io/thanos/pull/4239/files/15acd8c8683c8ecc785ec71e4c16f89738e839b6#diff-59764a4da653d4464eac20465390033ab8abbd8b54688979727065cb389e848d One of my issues with Ceph-Thanos is that I have 2x Prometheus pollers like a typical HA setup, and store 2x copies of each tsdb block (slightly different due to natural differences between two pollers). It looks like the offline deduplication you added with "penalty" mode intended for HA Prometheus would shrink my ceph bucket by 50%ish? By combining these 2x HA blocks? |
That |
@sevagh Let me move this to discussion as it is generally a question, not an issue. |
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
Hello,
I use Thanos with a rather large bucket (Ceph object store) - 10TB total. I store metrics at the raw resolution with 30 days retention, downsampling disabled.
Here's my systemd daemon for it:
The daemon runs in continual mode, but recently it has been slow to complete its compaction runs and it doesn't reach the "delete blocks" part of the run until it's too late and the storage bucket is overflowing (related issue: #2605)
I recently ran a compaction without the
--wait
mode, just to see how long a single run takes, and it's been 9 days so far without any deletions.I have a locally compiled binary of Thanos which only runs block deletions on blocks marked for deletion, described here: #2605 (comment)
One thing I can do is:
What I'm looking for is perhaps tips or solutions on how I can make this better.
Here are the metrics from the compact instance:
Any tips would be appreciated, thanks!
The text was updated successfully, but these errors were encountered: