-
Notifications
You must be signed in to change notification settings - Fork 6.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use user key on sst file for blob verification for Get and MultiGet #11105
Conversation
9567a06
to
4ff5884
Compare
@jowlyzhang has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@jowlyzhang has updated the pull request. You must reimport the pull request before landing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks so much for the fix @jowlyzhang ! It looks great 👍 👍 I have just one comment
table/get_context.cc
Outdated
@@ -306,6 +306,9 @@ bool GetContext::SaveValue(const ParsedInternalKey& parsed_key, | |||
if (kNotFound == state_) { | |||
state_ = kFound; | |||
if (do_merge_) { | |||
if (is_blob_index_) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is_blob_index_
here is a pointer to bool and with the current codebase, is always valid (non-null). I think what we would want here is to check that the value type is kTypeBlobIndex
and that timestamps are enabled (the latter can be established by checking ucmp_->timestamp_size()
). Note that this also involves a change to ukey_to_get_blob_value()
: if it's not populated (because timestamps are not enabled), we can user the original user key
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the suggestion. I misinterpreted is_blob_index_
previously. It makes sense to only populate this field when timestamp is a factor to consider.
@jowlyzhang has updated the pull request. You must reimport the pull request before landing. |
@jowlyzhang has updated the pull request. You must reimport the pull request before landing. |
@@ -194,6 +203,10 @@ class GetContext { | |||
|
|||
GetState state_; | |||
Slice user_key_; | |||
// When a blob index is found with the user key containing timestamp, | |||
// this copies the corresponding user key on record in the sst file | |||
// and is later used for blob verification. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure why a review comment here is not shown. So l just copied the comment below:
Also, we can probably ditch the Slice member since std::string can be implicitly converted to a Slice.
Re this comment:
Thank you for the naming suggestion, it's a better name to use.
About the Slice member, since in the MultiGet's case, the temporary Slice object will go out of scope when the blob_reqs_in_file
are later used, so I kept a Slice member in the GetContext. Now I kind of merged these two variables into one PinnableSlice variable with self storage, would this help make the code cleaner?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I deleted that comment when I saw you had switched to PinnableSlice
. Yes, I think this is a cleaner design 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks again for the fix!
@jowlyzhang has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
cdea1c4
to
e53b7c6
Compare
@jowlyzhang has updated the pull request. You must reimport the pull request before landing. |
P.S. Once this lands, we can try to enable this combination of features in the stress/crash tests |
@jowlyzhang has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
1 similar comment
@jowlyzhang has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Oh, and please add an entry to HISTORY.md (can be a separate PR). Thanks! |
I haven't figured out and fixed this compatibility issue for the iterator APIs, I'm thinking of adding an entry in HISTORY.md after that fix, what do you think? |
Are iterators also affected? Wasn't aware of that |
I have no idea how iterators are implemented and thought it could be worth double checking. Now that you mentioned it, I think you are right, there is no user provided key vs key in the database issue in the iterator case. |
Based on my cursory reading, most of the iterator code seems safe; we might need some small improvements here and there though (e.g. some logic related to the |
@jowlyzhang has updated the pull request. You must reimport the pull request before landing. |
@jowlyzhang has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@jowlyzhang merged this pull request in 24ac53d. |
Use the user key on sst file for blob verification for
Get
andMultiGet
instead of the user key passed from caller.Add tests for
Get
andMultiGet
operations when user defined timestamp feature is enabled in a BlobDB.Test:
make V=1 db_blob_basic_test
./db_blob_basic_test --gtest_filter="DBBlobTestWithTimestamp.*"