Logically strip timestamp during flush #11557

jowlyzhang · 2023-06-23T23:59:19Z

Logically strip the user-defined timestamp when L0 files are created during flush when AdvancedColumnFamilyOptions.persist_user_defined_timestamps is false. Logically stripping timestamp here means replacing the original user-defined timestamp with a mininum timestamp, which for now is hard coded to be all zeros bytes.

While working on this, I caught a missing piece on the BlockBuilder level for this feature. The current quick path std::min(buffer_size, last_key_size) needs a bit tweaking to work for this feature. When user-defined timestamp is stripped during block building, on writing first entry or right after resetting, buffer is empty and buffer_size is zero as usual. However, in follow-up writes, depending on the size of the stripped user-defined timestamp, and the size of the value, what's in buffer can sometimes be smaller than last_key_size, leading std::min(buffer_size, last_key_size) to truncate the last_key. Previous test doesn't caught the bug because in those tests, the size of the stripped user-defined timestamps bytes is smaller than the length of the value. In order to avoid the conditional operation, this PR changed the original trivial std::min operation into an arithmetic operation. Since this is a change in a hot and performance critical path, I did the following benchmark to check no observable regression is introduced.
TEST_TMPDIR=/dev/shm/rocksdb1 ./db_bench -benchmarks=fillseq -memtablerep=vector -allow_concurrent_memtable_write=false -num=50000000
Compiled with DEBUG_LEVEL=0
Test vs. control runs simulaneous for better accuracy, units = ops/sec
PR vs base:
Round 1: 350652 vs 349055
Round 2: 365733 vs 364308
Round 3: 355681 vs 354475

Test Plan:
New timestamp specific test added or existing tests augmented, both are parameterized with UserDefinedTimestampTestMode:
UserDefinedTimestampTestMode::kNormal -> UDT feature enabled, write / read with min timestamp
UserDefinedTimestampTestMode::kStripUserDefinedTimestamps -> UDT feature enabled, write / read with min timestamp, set Options.persist_user_defined_timestamps to false.

make all check
./db_wal_test --gtest_filter="*WithTimestamp*"
./flush_job_test --gtest_filter="*WithTimestamp*"
./repair_test --gtest_filter="*WithTimestamp*"
./block_based_table_reader_test

facebook-github-bot · 2023-06-26T18:12:38Z

@jowlyzhang has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

pdillinger

I find it confusing that some places talk about stripping or removing the timestamps while some talk about replacing with minimum timestamp. I might expect that if I don't persist timestamps, I could re-open the DB without timestamps configured at all, but I don't think that is true. I think we need to be more clear and careful about describing what the persist_user_defined_timestamps option does.

db/builder.cc

db/dbformat.h

facebook-github-bot · 2023-06-28T04:26:13Z

@jowlyzhang has updated the pull request. You must reimport the pull request before landing.

jowlyzhang · 2023-06-28T04:48:05Z

I find it confusing that some places talk about stripping or removing the timestamps while some talk about replacing with minimum timestamp. I might expect that if I don't persist timestamps, I could re-open the DB without timestamps configured at all, but I don't think that is true. I think we need to be more clear and careful about describing what the persist_user_defined_timestamps option does.

Thank you for this suggestion! I took a look at the existing comment for persist_user_defined_timestamps flag and realized it indeed needed more clarification. I added some comment there.

Specifically on this flush path, what not persist_user_defined_timestamps does is indeed removing the timestamps and do not persist them. The reason there are some places that logically strip (replace with min timestamp), and some places that actually strip (or remove) is because there are still some key comparisons happening at different levels during flush, for example, in BlockBasedTableBuilder, IndexBuilder etc. To ensure the user keys still have a compatible format as the timestamp aware user comparator, the user-defined timestamp part of the user key is not physically removed until the very last minute before it's building into a block in BlockBuilder, and before that, it's only logically removed.

Because these SST files are created without any user-defined timestamps, users indeed can re-open the DB without timestamps configured at all. At that time, they provide a user comparator that has timestamp_size set to 0, and the user keys in these SST files are instantly compatible with that format.

facebook-github-bot · 2023-06-28T04:56:29Z

@jowlyzhang has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2023-06-28T05:02:03Z

@jowlyzhang has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2023-06-29T16:50:40Z

@jowlyzhang has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

include/rocksdb/advanced_options.h

facebook-github-bot · 2023-06-29T17:23:12Z

@jowlyzhang has updated the pull request. You must reimport the pull request before landing.

pdillinger

LGTM except for the described hole in testing. Could be fixed in this PR or another.

pdillinger · 2023-06-29T21:05:06Z

Don't forget to re-import :)

jowlyzhang · 2023-06-29T21:16:35Z

LGTM except for the described hole in testing. Could be fixed in this PR or another.

Thank you for your review! I will add that test coverage in a follow up.

facebook-github-bot · 2023-06-29T21:16:56Z

@jowlyzhang has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2023-06-29T22:55:29Z

@jowlyzhang merged this pull request in 15053f3.

Summary: Thanks pdillinger for pointing out this test hole. The test `DBWALTestWithTimestamp.Recover` that is intended to test recovery from WAL including user-defined timestamps doesn't achieve its promised coverage. Specifically, after #11557, timestamps will be removed during flush, and RocksDB by default flush memtables during recovery with `avoid_flush_during_recovery` defaults to false. This test didn't fail even if all the timestamps are quickly lost due to the default flush behavior. This PR renamed test `Recover` to `RecoverAndNoFlush`, and updated it to verify timestamps are successfully recovered from WAL with some time-travel reads. `avoid_flush_during_recovery` is set to true to help do this verification. On the other hand, for test `DBWALTestWithTimestamp.RecoverAndFlush`, since flush on reopen is DB's default behavior. Setting the flags `max_write_buffer` and `arena_block_size` are not really the factors that enforces the flush, so these flags are removed. Pull Request resolved: #11577 Test Plan: ./db_wal_test Reviewed By: pdillinger Differential Revision: D47142892 Pulled By: jowlyzhang fbshipit-source-id: 9465e278806faa5885b541b4e32d99e698edef7d

facebook-github-bot added the CLA Signed label Jun 23, 2023

jowlyzhang force-pushed the flush_strip branch from b4a8398 to 15e2c02 Compare June 24, 2023 00:00

jowlyzhang marked this pull request as draft June 24, 2023 00:01

jowlyzhang force-pushed the flush_strip branch from 15e2c02 to 7f6b0eb Compare June 26, 2023 16:48

jowlyzhang marked this pull request as ready for review June 26, 2023 18:12

jowlyzhang requested a review from pdillinger June 26, 2023 18:12

Logically strip timestamp during flush

34d3840

pdillinger reviewed Jun 27, 2023

View reviewed changes

db/builder.cc Outdated Show resolved Hide resolved

db/builder.cc Outdated Show resolved Hide resolved

db/dbformat.h Outdated Show resolved Hide resolved

jowlyzhang force-pushed the flush_strip branch from 7f6b0eb to 21e4e61 Compare June 28, 2023 04:26

Address review comments

14b5a61

jowlyzhang force-pushed the flush_strip branch from 21e4e61 to 14b5a61 Compare June 28, 2023 04:56

jowlyzhang requested a review from pdillinger June 29, 2023 17:07

pdillinger reviewed Jun 29, 2023

View reviewed changes

include/rocksdb/advanced_options.h Show resolved Hide resolved

include/rocksdb/advanced_options.h Outdated Show resolved Hide resolved

Clarify the downgrade requirement

9305a54

jowlyzhang requested a review from pdillinger June 29, 2023 17:27

pdillinger approved these changes Jun 29, 2023

View reviewed changes

facebook-github-bot closed this in 15053f3 Jun 29, 2023

facebook-github-bot added the Merged label Jun 29, 2023

jowlyzhang mentioned this pull request Jun 30, 2023

Fix a unit test hole for recovering UDTs with WAL files #11577

Closed

jowlyzhang deleted the flush_strip branch July 11, 2023 18:36

igorcanadi mentioned this pull request Jan 17, 2024

[SYS-6913] Upgrade RocksDB-Cloud to 8.9.1 rockset/rocksdb-cloud#315

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Logically strip timestamp during flush #11557

Logically strip timestamp during flush #11557

jowlyzhang commented Jun 23, 2023 •

edited

Loading

facebook-github-bot commented Jun 26, 2023

pdillinger left a comment

facebook-github-bot commented Jun 28, 2023

jowlyzhang commented Jun 28, 2023

facebook-github-bot commented Jun 28, 2023

facebook-github-bot commented Jun 28, 2023

facebook-github-bot commented Jun 29, 2023

facebook-github-bot commented Jun 29, 2023

pdillinger left a comment

pdillinger commented Jun 29, 2023

jowlyzhang commented Jun 29, 2023

facebook-github-bot commented Jun 29, 2023

facebook-github-bot commented Jun 29, 2023

Logically strip timestamp during flush #11557

Logically strip timestamp during flush #11557

Conversation

jowlyzhang commented Jun 23, 2023 • edited Loading

facebook-github-bot commented Jun 26, 2023

pdillinger left a comment

Choose a reason for hiding this comment

facebook-github-bot commented Jun 28, 2023

jowlyzhang commented Jun 28, 2023

facebook-github-bot commented Jun 28, 2023

facebook-github-bot commented Jun 28, 2023

facebook-github-bot commented Jun 29, 2023

facebook-github-bot commented Jun 29, 2023

pdillinger left a comment

Choose a reason for hiding this comment

pdillinger commented Jun 29, 2023

jowlyzhang commented Jun 29, 2023

facebook-github-bot commented Jun 29, 2023

facebook-github-bot commented Jun 29, 2023

jowlyzhang commented Jun 23, 2023 •

edited

Loading