-
Notifications
You must be signed in to change notification settings - Fork 6.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Revise APIs related to user-defined timestamp #8946
Conversation
Personally, I prefer the WriteOptions::timestamp and think we should stick with the original implementation/proposal:
With all of these points in mind, I believe the WriteOptions::timestamp is a better approach. |
SstFileWriter uses What bugs me about all these options is that non-timestamp users have to deal with all the timestamp-specific overloads when writing their code, and vice-versa. If someone forgets a timestamp (or less likely, accidentally provides one), they only find out at run time. If we had DbWithTimestamp and DB, only containing applicable functions, that would reduce the mixing of disjoint functionality in the API. Obviously we would want a common base type like DbBase for functions and declarations in common. In addition or instead, we could have wrappers that allow putting more useful types than Slice on keys, timestamps (when applicable), and values. Perhaps call them TypedDb and TypedDbWithTimestamp. |
I doubt. For non-timestamp users, they do not have to worry about timestamp-specific overloads. If they accidentally call these overloads, we can return I am concerned about differentiating between <user_key> and <user_key, ts> internally, but this cannot be solved by wrapping |
For |
Would a DB (without timestamp) expose a similar requirement: it must be configured to use a Comparator that is not timestamp-aware? Also I think comparator is a ColumnFamilyOption |
The way I see it, there are two types of information passed to the write APIs: the key-value data that tells RocksDB what to write, and the write options (e.g. sync mode, disable WAL etc.) that tell RocksDB how to write the said key-values. In this classification, timestamps clearly belong in the what bucket. Also, I'm not concerned about adding new parameters/overloads for data (like timestamps) because our data model gets extended once in a blue moon. |
Yes, it is a column family option because we allow different column families to have different comparators. That said, it is allowed for some column families to enable timestamp, while others do not. Note that we can write to all these column families in ONE write batch. |
Furthermore, if an application just wants to implement own Comparator without concerning about timestamp can simply do so because the base class |
Oh right, so we need both in
Let's hope :) |
@@ -353,10 +353,17 @@ class DB { | |||
virtual Status Put(const WriteOptions& options, | |||
ColumnFamilyHandle* column_family, const Slice& key, | |||
const Slice& value) = 0; | |||
virtual Status Put(const WriteOptions& options, | |||
ColumnFamilyHandle* column_family, const Slice& key, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SstFileWriter uses the order Put(key, ts, value)
(re-iterating to make sure this isn't lost in the discussion)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will address these comments once we agree on the overall direction. :)
Currently, this is not a blocker for us, since we are working on adding timestamp support to WriteCommittedTxn transactions. I hope we can get this resolved before
- release of RocksDB 7.0
- people start adding timestamp support for Merge, DeleteRange, etc.
- declaring (subset of) feature complete for customers
whichever comes first.
4b5dd0a
to
5a43ee6
Compare
3fa6d4e
to
114277e
Compare
114277e
to
06d0c1c
Compare
06d0c1c
to
418bf4a
Compare
418bf4a
to
04d671b
Compare
543ba27
to
a8698c7
Compare
Summary: Rename argument 'checker' to `ts_sz_func`.
115e226
to
0af39d1
Compare
@riversand963 has updated the pull request. You must reimport the pull request before landing. |
@riversand963 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Summary: ajkr reminded me that we have a rule of not including per-kv related data in `WriteOptions`. Namely, `WriteOptions` should not include information about "what-to-write", but should just include information about "how-to-write". According to this rule, `WriteOptions::timestamp` (experimental) is clearly a violation. Therefore, this PR removes `WriteOptions::timestamp` for compliance. After the removal, we need to pass timestamp info via another set of APIs. This PR proposes a set of overloaded functions `Put(write_opts, key, value, ts)`, `Delete(write_opts, key, ts)`, and `SingleDelete(write_opts, key, ts)`. Planned to add `Write(write_opts, batch, ts)`, but its complexity made me reconsider doing it in another PR (maybe). For better checking and returning error early, we also add a new set of APIs to `WriteBatch` that take extra `timestamp` information when writing to `WriteBatch`es. These set of APIs in `WriteBatchWithIndex` are currently not supported, and are on our TODO list. Removed `WriteBatch::AssignTimestamps()` and renamed `WriteBatch::AssignTimestamp()` to `WriteBatch::UpdateTimestamps()` since this method require that all keys have space for timestamps allocated already and multiple timestamps can be updated. The constructor of `WriteBatch` now takes a fourth argument `default_cf_ts_sz` which is the timestamp size of the default column family. This will be used to allocate space when calling APIs that do not specify a column family handle. Also, updated `DB::Get()`, `DB::MultiGet()`, `DB::NewIterator()`, `DB::NewIterators()` methods, replacing some assertions about timestamp to returning Status code. Pull Request resolved: facebook#8946 Test Plan: make check ./db_bench -benchmarks=fillseq,fillrandom,readrandom,readseq,deleterandom -user_timestamp_size=8 ./db_stress --user_timestamp_size=8 -nooverwritepercent=0 -test_secondary=0 -secondary_catch_up_one_in=0 -continuous_verification_interval=0 Make sure there is no perf regression by running the following ``` ./db_bench_opt -db=/dev/shm/rocksdb -use_existing_db=0 -level0_stop_writes_trigger=256 -level0_slowdown_writes_trigger=256 -level0_file_num_compaction_trigger=256 -disable_wal=1 -duration=10 -benchmarks=fillrandom ``` Before this PR ``` DB path: [/dev/shm/rocksdb] fillrandom : 1.831 micros/op 546235 ops/sec; 60.4 MB/s ``` After this PR ``` DB path: [/dev/shm/rocksdb] fillrandom : 1.820 micros/op 549404 ops/sec; 60.8 MB/s ``` Reviewed By: ltamasi Differential Revision: D33721359 Pulled By: riversand963 fbshipit-source-id: c131561534272c120ffb80711d42748d21badf09 Signed-off-by: v01dstar <yang.zhang@pingcap.com>
Summary: ajkr reminded me that we have a rule of not including per-kv related data in `WriteOptions`. Namely, `WriteOptions` should not include information about "what-to-write", but should just include information about "how-to-write". According to this rule, `WriteOptions::timestamp` (experimental) is clearly a violation. Therefore, this PR removes `WriteOptions::timestamp` for compliance. After the removal, we need to pass timestamp info via another set of APIs. This PR proposes a set of overloaded functions `Put(write_opts, key, value, ts)`, `Delete(write_opts, key, ts)`, and `SingleDelete(write_opts, key, ts)`. Planned to add `Write(write_opts, batch, ts)`, but its complexity made me reconsider doing it in another PR (maybe). For better checking and returning error early, we also add a new set of APIs to `WriteBatch` that take extra `timestamp` information when writing to `WriteBatch`es. These set of APIs in `WriteBatchWithIndex` are currently not supported, and are on our TODO list. Removed `WriteBatch::AssignTimestamps()` and renamed `WriteBatch::AssignTimestamp()` to `WriteBatch::UpdateTimestamps()` since this method require that all keys have space for timestamps allocated already and multiple timestamps can be updated. The constructor of `WriteBatch` now takes a fourth argument `default_cf_ts_sz` which is the timestamp size of the default column family. This will be used to allocate space when calling APIs that do not specify a column family handle. Also, updated `DB::Get()`, `DB::MultiGet()`, `DB::NewIterator()`, `DB::NewIterators()` methods, replacing some assertions about timestamp to returning Status code. Pull Request resolved: facebook#8946 Test Plan: make check ./db_bench -benchmarks=fillseq,fillrandom,readrandom,readseq,deleterandom -user_timestamp_size=8 ./db_stress --user_timestamp_size=8 -nooverwritepercent=0 -test_secondary=0 -secondary_catch_up_one_in=0 -continuous_verification_interval=0 Make sure there is no perf regression by running the following ``` ./db_bench_opt -db=/dev/shm/rocksdb -use_existing_db=0 -level0_stop_writes_trigger=256 -level0_slowdown_writes_trigger=256 -level0_file_num_compaction_trigger=256 -disable_wal=1 -duration=10 -benchmarks=fillrandom ``` Before this PR ``` DB path: [/dev/shm/rocksdb] fillrandom : 1.831 micros/op 546235 ops/sec; 60.4 MB/s ``` After this PR ``` DB path: [/dev/shm/rocksdb] fillrandom : 1.820 micros/op 549404 ops/sec; 60.8 MB/s ``` Reviewed By: ltamasi Differential Revision: D33721359 Pulled By: riversand963 fbshipit-source-id: c131561534272c120ffb80711d42748d21badf09 Signed-off-by: v01dstar <yang.zhang@pingcap.com>
@ajkr reminded me that we have a rule of not including per-kv related data in
WriteOptions
.Namely,
WriteOptions
should not include information about "what-to-write", but should justinclude information about "how-to-write".
According to this rule,
WriteOptions::timestamp
(experimental) is clearly a violation. Therefore,this PR removes
WriteOptions::timestamp
for compliance.After the removal, we need to pass timestamp info via another set of APIs. This PR proposes a set
of overloaded functions
Put(write_opts, key, value, ts)
,Delete(write_opts, key, ts)
, andSingleDelete(write_opts, key, ts)
. Planned to addWrite(write_opts, batch, ts)
, but its complexitymade me reconsider doing it in another PR (maybe).
For better checking and returning error early, we also add a new set of APIs to
WriteBatch
that takeextra
timestamp
information when writing toWriteBatch
es.These set of APIs in
WriteBatchWithIndex
are currently not supported, and are on our TODO list.Removed
WriteBatch::AssignTimestamps()
and renamedWriteBatch::AssignTimestamp()
toWriteBatch::UpdateTimestamps()
since this method require that all keys have space for timestampsallocated already and multiple timestamps can be updated.
The constructor of
WriteBatch
now takes a fourth argumentdefault_cf_ts_sz
which is the timestampsize of the default column family. This will be used to allocate space when calling APIs that do not
specify a column family handle.
Also, updated
DB::Get()
,DB::MultiGet()
,DB::NewIterator()
,DB::NewIterators()
methods, replacingsome assertions about timestamp to returning Status code.
Test plan:
make check
./db_bench -benchmarks=fillseq,fillrandom,readrandom,readseq,deleterandom -user_timestamp_size=8
./db_stress --user_timestamp_size=8 -nooverwritepercent=0 -test_secondary=0 -secondary_catch_up_one_in=0 -continuous_verification_interval=0
Make sure there is no perf regression by running the following
Before this PR
After this PR