-
Notifications
You must be signed in to change notification settings - Fork 6.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Logically strip timestamp during flush #11557
Conversation
@jowlyzhang has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find it confusing that some places talk about stripping or removing the timestamps while some talk about replacing with minimum timestamp. I might expect that if I don't persist timestamps, I could re-open the DB without timestamps configured at all, but I don't think that is true. I think we need to be more clear and careful about describing what the persist_user_defined_timestamps option does.
@jowlyzhang has updated the pull request. You must reimport the pull request before landing. |
Thank you for this suggestion! I took a look at the existing comment for Specifically on this flush path, what not Because these SST files are created without any user-defined timestamps, users indeed can re-open the DB without timestamps configured at all. At that time, they provide a user comparator that has |
@jowlyzhang has updated the pull request. You must reimport the pull request before landing. |
1 similar comment
@jowlyzhang has updated the pull request. You must reimport the pull request before landing. |
@jowlyzhang has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@jowlyzhang has updated the pull request. You must reimport the pull request before landing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM except for the described hole in testing. Could be fixed in this PR or another.
Don't forget to re-import :) |
Thank you for your review! I will add that test coverage in a follow up. |
@jowlyzhang has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@jowlyzhang merged this pull request in 15053f3. |
Summary: Thanks pdillinger for pointing out this test hole. The test `DBWALTestWithTimestamp.Recover` that is intended to test recovery from WAL including user-defined timestamps doesn't achieve its promised coverage. Specifically, after #11557, timestamps will be removed during flush, and RocksDB by default flush memtables during recovery with `avoid_flush_during_recovery` defaults to false. This test didn't fail even if all the timestamps are quickly lost due to the default flush behavior. This PR renamed test `Recover` to `RecoverAndNoFlush`, and updated it to verify timestamps are successfully recovered from WAL with some time-travel reads. `avoid_flush_during_recovery` is set to true to help do this verification. On the other hand, for test `DBWALTestWithTimestamp.RecoverAndFlush`, since flush on reopen is DB's default behavior. Setting the flags `max_write_buffer` and `arena_block_size` are not really the factors that enforces the flush, so these flags are removed. Pull Request resolved: #11577 Test Plan: ./db_wal_test Reviewed By: pdillinger Differential Revision: D47142892 Pulled By: jowlyzhang fbshipit-source-id: 9465e278806faa5885b541b4e32d99e698edef7d
Logically strip the user-defined timestamp when L0 files are created during flush when
AdvancedColumnFamilyOptions.persist_user_defined_timestamps
is false. Logically stripping timestamp here means replacing the original user-defined timestamp with a mininum timestamp, which for now is hard coded to be all zeros bytes.While working on this, I caught a missing piece on the
BlockBuilder
level for this feature. The current quick pathstd::min(buffer_size, last_key_size)
needs a bit tweaking to work for this feature. When user-defined timestamp is stripped during block building, on writing first entry or right after resetting,buffer
is empty andbuffer_size
is zero as usual. However, in follow-up writes, depending on the size of the stripped user-defined timestamp, and the size of the value, what's inbuffer
can sometimes be smaller thanlast_key_size
, leadingstd::min(buffer_size, last_key_size)
to truncate thelast_key
. Previous test doesn't caught the bug because in those tests, the size of the stripped user-defined timestamps bytes is smaller than the length of the value. In order to avoid the conditional operation, this PR changed the original trivialstd::min
operation into an arithmetic operation. Since this is a change in a hot and performance critical path, I did the following benchmark to check no observable regression is introduced.TEST_TMPDIR=/dev/shm/rocksdb1 ./db_bench -benchmarks=fillseq -memtablerep=vector -allow_concurrent_memtable_write=false -num=50000000
Compiled with DEBUG_LEVEL=0
Test vs. control runs simulaneous for better accuracy, units = ops/sec
PR vs base:
Round 1: 350652 vs 349055
Round 2: 365733 vs 364308
Round 3: 355681 vs 354475
Test Plan:
New timestamp specific test added or existing tests augmented, both are parameterized with
UserDefinedTimestampTestMode
:UserDefinedTimestampTestMode::kNormal
-> UDT feature enabled, write / read with min timestampUserDefinedTimestampTestMode::kStripUserDefinedTimestamps
-> UDT feature enabled, write / read with min timestamp, set Options.persist_user_defined_timestamps to false.