Skip to content

Write block deletion marks in the global location too #3561

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Dec 9, 2020

Conversation

pracucci
Copy link
Contributor

@pracucci pracucci commented Dec 2, 2020

What this PR does:
This is another preliminary step required to implement the bucket index, as designed in #3554. In this PR:

  • Introduce bucket index reader and writer (but not plugged in yet)
  • Introduce a bucket client wrapper used to store block deletion marks in the per-tenant global location too
  • Plug the bucket client wrapper in the compactor (which is the component deleting blocks)

Which issue(s) this PR fixes:
N/A

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

Copy link
Contributor

@pstibrany pstibrany left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice job! I have mostly minor nits and comments, nothing blocking.

Unrelated to this PR: have you considered naming bucketindex.Block to BlockMeta.

return ulid.ULID{}, false
}

return block.IsBlockDir(parts[len(parts)-2])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return block.IsBlockDir(parts[len(parts)-2])
return block.IsBlockDir(path.Dir(name))

and avoid split.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not the same thing. The object prefix could be /user-1/<block-id>/deletion-mark.json. We need to pick the 2nd last segment, not the whole path.Dir().

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to pick the 2nd last segment, not the whole path.Dir().

block.IsBlockDir already covers that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I missed that, sorry. Fixed!

func (w *Writer) generateBlockDeletionMarkIndexEntry(ctx context.Context, id ulid.ULID) (*BlockDeletionMark, error) {
markFile := path.Join(id.String(), metadata.DeletionMarkFilename)

// Get the block's deletion mark file.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can replace marker reading code in this method with:

	m := metadata.DeletionMark{}
	err := metadata.ReadMarker(ctx, w.logger, w.bkt, id.String(), &m)

(UserBucketClient already implements required InstrumentedBucket)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very good point. Done, thanks!

Signed-off-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>
@pracucci
Copy link
Contributor Author

pracucci commented Dec 9, 2020

have you considered naming bucketindex.Block to BlockMeta

Yes. I would prefer to not call it "BlockMeta" to avoid misunderstanding. We don't index the BlockMeta, but some information about the Block and I would prefer not couple it with the meta.json (eg. we may also index block information which is not in the meta.json).

Signed-off-by: Marco Pracucci <marco@pracucci.com>
@pracucci
Copy link
Contributor Author

pracucci commented Dec 9, 2020

Thanks a lot @pstibrany for your valuable feedback! I addressed most comments and replied to few other ones. Could you take another look, please?

Signed-off-by: Marco Pracucci <marco@pracucci.com>
Copy link
Contributor

@pstibrany pstibrany left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, nice work.

Signed-off-by: Marco Pracucci <marco@pracucci.com>
@pracucci pracucci merged commit e98dfb6 into cortexproject:master Dec 9, 2020
@pracucci pracucci deleted the write-bucket-index branch December 9, 2020 10:01
@lzh-lab
Copy link
Contributor

lzh-lab commented Dec 11, 2020

In bucket_index.json aleady have block ids which is marked delete,
Why save a delete file for every block separately?

@pracucci
Copy link
Contributor Author

In bucket_index.json aleady have block ids which is marked delete,
Why save a delete file for every block separately?

@AllenzhLi Without a per-tenant global markers/ location we wouldn't have an easy way to find out which blocks have been marked for deletion. Without that, we would have to check the deletion mark for every single block with 1 GET object operation per block (which is what we do right now). With the global location, we can just issue 1 LIST objects API call to find all blocks marked for deletion.

@lzh-lab
Copy link
Contributor

lzh-lab commented Dec 11, 2020

Thank you for your reply.

I have read the design doc, in the doc
image
In the bucket_index.json have the list of block deletion marks, why not use it?

@pstibrany
Copy link
Contributor

In the bucket_index.json have the list of block deletion marks, why not use it?

Bucket index JSON is created based on found blocks and deletion marks. Querier and store-gateway only use index (and ignore blocks marked for deletion). During compaction, compactor will only create deletion marks, but not update the index. Index update is a separate loop, that looks for deletion marks.

@lzh-lab
Copy link
Contributor

lzh-lab commented Dec 11, 2020

Understood, thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants