Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use user key on sst file for blob verification for Get and MultiGet #11105

Closed
wants to merge 6 commits into from

Conversation

jowlyzhang
Copy link
Contributor

@jowlyzhang jowlyzhang commented Jan 18, 2023

Use the user key on sst file for blob verification for Get and MultiGet instead of the user key passed from caller.

Add tests for Get and MultiGet operations when user defined timestamp feature is enabled in a BlobDB.

Test:
make V=1 db_blob_basic_test
./db_blob_basic_test --gtest_filter="DBBlobTestWithTimestamp.*"

@jowlyzhang jowlyzhang force-pushed the blob_with_ts branch 3 times, most recently from 9567a06 to 4ff5884 Compare January 24, 2023 06:36
@jowlyzhang jowlyzhang changed the title Add basic unit tests for blob db with user defined timestamp Use user key on sst file for blob verification for Get and MultiGet Jan 24, 2023
@facebook-github-bot
Copy link
Contributor

@jowlyzhang has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@jowlyzhang jowlyzhang linked an issue Jan 25, 2023 that may be closed by this pull request
@jowlyzhang jowlyzhang marked this pull request as draft January 27, 2023 19:01
@facebook-github-bot
Copy link
Contributor

@jowlyzhang has updated the pull request. You must reimport the pull request before landing.

Copy link
Contributor

@ltamasi ltamasi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much for the fix @jowlyzhang ! It looks great 👍 👍 I have just one comment

@@ -306,6 +306,9 @@ bool GetContext::SaveValue(const ParsedInternalKey& parsed_key,
if (kNotFound == state_) {
state_ = kFound;
if (do_merge_) {
if (is_blob_index_) {
Copy link
Contributor

@ltamasi ltamasi Jan 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is_blob_index_ here is a pointer to bool and with the current codebase, is always valid (non-null). I think what we would want here is to check that the value type is kTypeBlobIndex and that timestamps are enabled (the latter can be established by checking ucmp_->timestamp_size()). Note that this also involves a change to ukey_to_get_blob_value(): if it's not populated (because timestamps are not enabled), we can user the original user key

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the suggestion. I misinterpreted is_blob_index_ previously. It makes sense to only populate this field when timestamp is a factor to consider.

@facebook-github-bot
Copy link
Contributor

@jowlyzhang has updated the pull request. You must reimport the pull request before landing.

@jowlyzhang jowlyzhang marked this pull request as ready for review January 27, 2023 21:57
@facebook-github-bot
Copy link
Contributor

@jowlyzhang has updated the pull request. You must reimport the pull request before landing.

@@ -194,6 +203,10 @@ class GetContext {

GetState state_;
Slice user_key_;
// When a blob index is found with the user key containing timestamp,
// this copies the corresponding user key on record in the sst file
// and is later used for blob verification.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why a review comment here is not shown. So l just copied the comment below:


Also, we can probably ditch the Slice member since std::string can be implicitly converted to a Slice.

Re this comment:
Thank you for the naming suggestion, it's a better name to use.

About the Slice member, since in the MultiGet's case, the temporary Slice object will go out of scope when the blob_reqs_in_file are later used, so I kept a Slice member in the GetContext. Now I kind of merged these two variables into one PinnableSlice variable with self storage, would this help make the code cleaner?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I deleted that comment when I saw you had switched to PinnableSlice. Yes, I think this is a cleaner design 👍

Copy link
Contributor

@ltamasi ltamasi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks again for the fix!

@facebook-github-bot
Copy link
Contributor

@jowlyzhang has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@jowlyzhang has updated the pull request. You must reimport the pull request before landing.

@ltamasi
Copy link
Contributor

ltamasi commented Jan 27, 2023

P.S. Once this lands, we can try to enable this combination of features in the stress/crash tests

@facebook-github-bot
Copy link
Contributor

@jowlyzhang has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

1 similar comment
@facebook-github-bot
Copy link
Contributor

@jowlyzhang has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@ltamasi
Copy link
Contributor

ltamasi commented Jan 27, 2023

Oh, and please add an entry to HISTORY.md (can be a separate PR). Thanks!

@jowlyzhang
Copy link
Contributor Author

Oh, and please add an entry to HISTORY.md (can be a separate PR). Thanks!

I haven't figured out and fixed this compatibility issue for the iterator APIs, I'm thinking of adding an entry in HISTORY.md after that fix, what do you think?

@ltamasi
Copy link
Contributor

ltamasi commented Jan 27, 2023

Oh, and please add an entry to HISTORY.md (can be a separate PR). Thanks!

I haven't figured out and fixed this compatibility issue for the iterator APIs, I'm thinking of adding an entry in HISTORY.md after that fix, what do you think?

Are iterators also affected? Wasn't aware of that

@jowlyzhang
Copy link
Contributor Author

Oh, and please add an entry to HISTORY.md (can be a separate PR). Thanks!

I haven't figured out and fixed this compatibility issue for the iterator APIs, I'm thinking of adding an entry in HISTORY.md after that fix, what do you think?

Are iterators also affected? Wasn't aware of that

I have no idea how iterators are implemented and thought it could be worth double checking. Now that you mentioned it, I think you are right, there is no user provided key vs key in the database issue in the iterator case.

@jowlyzhang jowlyzhang closed this Jan 28, 2023
@jowlyzhang jowlyzhang reopened this Jan 28, 2023
@ltamasi
Copy link
Contributor

ltamasi commented Jan 28, 2023

I have no idea how iterators are implemented and thought it could be worth double checking. Now that you mentioned it, I think you are right, there is no user provided key vs key in the database issue in the iterator case.

Based on my cursory reading, most of the iterator code seems safe; we might need some small improvements here and there though (e.g. some logic related to the iter_start_ts read option seems to be missing for blobs). This shouldn't affect this PR though :)

@facebook-github-bot
Copy link
Contributor

@jowlyzhang has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@jowlyzhang has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@jowlyzhang merged this pull request in 24ac53d.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support using user-defined timestamps in combination with BlobDB
3 participants