Skip to content

Compactor: retry compaction of a single tenant on failure instead of re-running compaction for all tenants #3627

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

pracucci
Copy link
Contributor

What this PR does:
Currently, the compactor re-run compaction for all tenants in case at least 1 tenant failed. Despite re-running the compaction for a previously succeeded tenant doesn't redo the work, but considering each single compaction run needs to re-scan the bucket, this ends up to be pretty inefficient for Cortex clusters running with a large number of tenants.

In this PR I'm proposing to retry compaction on failure only for the failed tenants.

Which issue(s) this PR fixes:
N/A

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

@ranton256
Copy link
Contributor

Thanks for this, should reduce the amount of list calls on s3 or similar.

Copy link
Contributor

@ranton256 ranton256 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

MaxRetries: c.compactorCfg.CompactionRetries,
})

func (c *Compactor) compactUsers(ctx context.Context) (returnErr error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Returned error is not used anywhere. We should remove it (we already log all errors anyway), and perhaps also simplify returnErr to boolean flag for updating metrics.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it look better now?

Copy link
Contributor

@pstibrany pstibrany Jan 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks better. As a final change, I'd suggest removing unused return value completely, and convert succeeded into local variable. (non-blocking nit)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah, sure, done.

@pracucci pracucci force-pushed the compactor-retry-single-tenant-on-failure branch from d80e8db to c0a4a4b Compare January 4, 2021 13:22
@pull-request-size pull-request-size bot added size/L and removed size/M labels Jan 4, 2021
…re-running compaction for all tenants

Signed-off-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>
@pracucci pracucci force-pushed the compactor-retry-single-tenant-on-failure branch from 44f59fe to 6b3a5b9 Compare January 4, 2021 15:36
Signed-off-by: Marco Pracucci <marco@pracucci.com>
@pracucci pracucci merged commit 739d3f0 into cortexproject:master Jan 5, 2021
@pracucci pracucci deleted the compactor-retry-single-tenant-on-failure branch January 5, 2021 07:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants