-
Notifications
You must be signed in to change notification settings - Fork 820
Compactor: retry compaction of a single tenant on failure instead of re-running compaction for all tenants #3627
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compactor: retry compaction of a single tenant on failure instead of re-running compaction for all tenants #3627
Conversation
Thanks for this, should reduce the amount of list calls on s3 or similar. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
pkg/compactor/compactor.go
Outdated
MaxRetries: c.compactorCfg.CompactionRetries, | ||
}) | ||
|
||
func (c *Compactor) compactUsers(ctx context.Context) (returnErr error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Returned error is not used anywhere. We should remove it (we already log all errors anyway), and perhaps also simplify returnErr
to boolean flag for updating metrics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it look better now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks better. As a final change, I'd suggest removing unused return value completely, and convert succeeded
into local variable. (non-blocking nit)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yeah, sure, done.
d80e8db
to
c0a4a4b
Compare
…re-running compaction for all tenants Signed-off-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>
44f59fe
to
6b3a5b9
Compare
Signed-off-by: Marco Pracucci <marco@pracucci.com>
What this PR does:
Currently, the compactor re-run compaction for all tenants in case at least 1 tenant failed. Despite re-running the compaction for a previously succeeded tenant doesn't redo the work, but considering each single compaction run needs to re-scan the bucket, this ends up to be pretty inefficient for Cortex clusters running with a large number of tenants.
In this PR I'm proposing to retry compaction on failure only for the failed tenants.
Which issue(s) this PR fixes:
N/A
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]