-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compact: skip compaction for blocks with no samples #904
Compact: skip compaction for blocks with no samples #904
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC, the case we were observing was essentially a corrupted block, where we expect data based on the block but see none in the chunks. As we were discussing, this probably shouldn't be silenced entirely, rather the error reporting should not affect the processing of independent blocks. @bwplotka ?
Wow, ok I think we have 2-3 separate cases here.
|
…ndle from bucket compactor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally OK, just some suggestions.
pkg/compact/compact.go
Outdated
@@ -34,6 +34,7 @@ const ( | |||
) | |||
|
|||
var blockTooFreshSentinelError = errors.New("Block too fresh") | |||
var emptyBlockSentinelULID = ulid.MustNew(123, nil) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sentinel is for noMoreCompactionForGroup
I guess? can we name it like this or something similar?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK so I think there is misformation here , I would prefer something opposite like noMoreCompactionForGroup
instead of this? What do you think? OR even bool flag return argument to make it more explicit (:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used a separate bool flag as I think that's the clearest what's happening. The naming was a little awkward but hopefully it makes sense, let me know what you think :)
@@ -513,23 +513,23 @@ func (cg *Group) Resolution() int64 { | |||
|
|||
// Compact plans and runs a single compaction against the group. The compacted result | |||
// is uploaded into the bucket the blocks were retrieved from. | |||
func (cg *Group) Compact(ctx context.Context, dir string, comp tsdb.Compactor) (ulid.ULID, error) { | |||
func (cg *Group) Compact(ctx context.Context, dir string, comp tsdb.Compactor) (bool, ulid.ULID, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need ulid.ULID
now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I originally tried to remove the ulid.ULID
but had to add it back in as we need it for compact_e2e_test.go
(as written right now). In the case when the compactor actually does something, the test asserts that the new block was actually uploaded, and it needs the id for that.
I thought about changing the test, but it seemed worthwhile keeping the coverage for the upload stage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, let's not end in rabbit whole here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can stick to (bool, error) return args now, right @mjd95 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks.
One small nit, but we can fix it later on, not a blocker
@@ -513,23 +513,23 @@ func (cg *Group) Resolution() int64 { | |||
|
|||
// Compact plans and runs a single compaction against the group. The compacted result | |||
// is uploaded into the bucket the blocks were retrieved from. | |||
func (cg *Group) Compact(ctx context.Context, dir string, comp tsdb.Compactor) (ulid.ULID, error) { | |||
func (cg *Group) Compact(ctx context.Context, dir string, comp tsdb.Compactor) (bool, ulid.ULID, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, let's not end in rabbit whole here.
} | ||
|
||
compID, err := cg.compact(ctx, subDir, comp) | ||
shouldRerun, compID, err := cg.compact(ctx, subDir, comp) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like this as this indicate that something wrong happen and we need to retry.. non empty ulid was actually more sensible maybe? Or at least we should change var name... Not a blocker.
* skip compaction for blocks with no samples * update to actually delete the empty input blocks, and to correctly handle from bucket compactor * warn on error deleting empty block * use ULID instead of error for emptyBlockSentinel * don't use a global variable * full stop at end of comment * use boolean to indicate whether there is more compaction work * rename variables * fix test
Changes
During the Thanos group compactor, in the case when the Prometheus compactor has found that the resulting block would have no data, delete the source blocks (after checking they are indeed empty) and exit early. Introduce a sentinel error to mark this case, to force the caller (bucket compactor) to continue compacting the rest of that group.
It is necessary to handle the empty compaction case separately because the later parts of the group compactor result in an error if we attempt to continue.
Note that the Prometheus compactor uses the empty ULID for the "block with no samples" case:
Verification
Ran the compactor against a bucket where some of the compacted blocks would end up empty.