-
Notifications
You must be signed in to change notification settings - Fork 6.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
save a key comparison in block seeks #6646
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ajkr has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
edit: this problem is fixed. huh, after ~30 minutes stress test failed. not sure yet if related.
|
d1534c0
to
a3881b2
Compare
@ajkr has updated the pull request. Re-import the pull request |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ajkr has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
a3881b2
to
430f3bc
Compare
@ajkr has updated the pull request. Re-import the pull request |
430f3bc
to
fcd0a44
Compare
@ajkr has updated the pull request. Re-import the pull request |
alright, known problems are fixed so ready for review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ajkr has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
fcd0a44
to
262a7e2
Compare
@ajkr has updated the pull request. Re-import the pull request |
262a7e2
to
92af65f
Compare
@ajkr has updated the pull request. Re-import the pull request |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ajkr has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
92af65f
to
4e67eb7
Compare
@ajkr has updated the pull request. Re-import the pull request |
4e67eb7
to
397469b
Compare
@ajkr has updated the pull request. Re-import the pull request |
397469b
to
efbc981
Compare
@ajkr has updated the pull request. Re-import the pull request |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ajkr has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
efbc981
to
1db34e7
Compare
3302eeb
to
50f890b
Compare
@ajkr has updated the pull request. Re-import the pull request |
table/block_based/block.cc
Outdated
ParseNextDataKey<DecodeEntry>(); | ||
} | ||
} else { | ||
SeekToRestartPoint(index - 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. I also redid the BinarySeek()
to avoid changing the binary search. The number of key comparisons saved lessened as a result, but I am fine with leaving those for a future PR..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
couple explanatory comments
50f890b
to
c55640a
Compare
@ajkr has updated the pull request. Re-import the pull request |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ajkr has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this. Some suggestions for readability, but good.
Wouldn't hurt to have another set of eyes on this for correctness.
MacOS compilation failure |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MacOS compilation failure
Fixed (I think).
Thanks for the review!
c55640a
to
7f6c018
Compare
@ajkr has updated the pull request. Re-import the pull request |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ajkr has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Just a few nits and suggestions.
const Comparator* comp) { | ||
// Linear search (within restart block) for first key >= target | ||
SeekToRestartPoint(index); | ||
Next(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A comment here might help. It took me a little while to figure out that this Next()
actually doesn't advance to the next key, but just ends up decoding the key at the restart point we seek'd to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, done.
table/block_based/block.cc
Outdated
*index = left; | ||
if (*index == 0) { | ||
// Special case as we land at zero as long as restart key at index 1 is >= |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We land at 0 if restart key at index 1 > "target", right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, done.
table/block_based/block.cc
Outdated
@@ -393,6 +386,7 @@ void IndexBlockIter::Seek(const Slice& target) { | |||
} | |||
status_ = Status::OK(); | |||
uint32_t index = 0; | |||
bool is_index_key_result = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something like skip_linear_scan
might be clearer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Also renamed "ScanAfterBinarySeek" -> "FindKeyAfterBinarySeek" to make it fit in better.
max_offset = port::kMaxUint32; | ||
} | ||
while (true) { | ||
Next(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we just use ParseNextDataKey
instead of Next
and Valid
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's making use of polymorphism here -- IndexBlockIter::Next()
calls ParseNextIndexKey()
while DataBlockIter::Next()
calls ParseNextDataKey()
.
This saves up to two key comparisons in block seeks. The first key comparison saved is a redundant key comparison against the restart key where the linear scan starts. This comparison is saved in all cases except when the found key is in the first restart interval. The second key comparison saved is a redundant key comparison against the restart key where the linear scan ends. This is only saved in cases where all keys in the restart interval are less than the target (probability roughly `1/restart_interval`). Test Plan: ran readrandom with mostly default settings and counted key comparisons using `PerfContext`. before: `user_key_comparison_count = 30370310` after: `user_key_comparison_count = 28881965` setup command: ``` $ TEST_TMPDIR=/dev/shm/dbbench ./db_bench -benchmarks=fillrandom,compact -write_buffer_size=1048576 -target_file_size_base=1048576 -max_bytes_for_level_base=4194304 -max_background_jobs=12 -level_compaction_dynamic_level_bytes=true -num=10000000 ``` benchmark command: ``` $ TEST_TMPDIR=/dev/shm/dbbench/ ./db_bench -use_existing_db=true -benchmarks=readrandom -disable_auto_compactions=true -num=10000000 -compression_type=none -reads=1000000 -perf_level=3 ```
7f6c018
to
463f274
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review! Addressed the comments.
table/block_based/block.cc
Outdated
@@ -393,6 +386,7 @@ void IndexBlockIter::Seek(const Slice& target) { | |||
} | |||
status_ = Status::OK(); | |||
uint32_t index = 0; | |||
bool is_index_key_result = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Also renamed "ScanAfterBinarySeek" -> "FindKeyAfterBinarySeek" to make it fit in better.
const Comparator* comp) { | ||
// Linear search (within restart block) for first key >= target | ||
SeekToRestartPoint(index); | ||
Next(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, done.
max_offset = port::kMaxUint32; | ||
} | ||
while (true) { | ||
Next(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's making use of polymorphism here -- IndexBlockIter::Next()
calls ParseNextIndexKey()
while DataBlockIter::Next()
calls ParseNextDataKey()
.
table/block_based/block.cc
Outdated
*index = left; | ||
if (*index == 0) { | ||
// Special case as we land at zero as long as restart key at index 1 is >= |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, done.
@ajkr has updated the pull request. Re-import the pull request |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ajkr has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
Summary: This is a followup to #6646. In that PR, for simplicity I just appended a comparison against the 0th restart key in case `BinarySeek()`'s binary search landed at index 0. As a result there were `2/(N+1) + log_2(N)` key comparisons. This PR does it differently. Now we expand the binary search range by one so it also covers the case where target is at or before the restart key at index 0. As a result, it involves `log_2(N+1)` key comparisons. Pull Request resolved: #7068 Test Plan: ran readrandom with mostly default settings and counted key comparisons using `PerfContext`. before: `user_key_comparison_count = 28881965` after: `user_key_comparison_count = 27823245` setup command: ``` $ TEST_TMPDIR=/dev/shm/dbbench ./db_bench -benchmarks=fillrandom,compact -write_buffer_size=1048576 -target_file_size_base=1048576 -max_bytes_for_level_base=4194304 -max_background_jobs=12 -level_compaction_dynamic_level_bytes=true -num=10000000 ``` benchmark command: ``` $ TEST_TMPDIR=/dev/shm/dbbench/ ./db_bench -use_existing_db=true -benchmarks=readrandom -disable_auto_compactions=true -num=10000000 -compression_type=none -reads=1000000 -perf_level=3 ``` Reviewed By: anand1976 Differential Revision: D22357032 Pulled By: ajkr fbshipit-source-id: 8b01e9c1c2a4e9d02fc9dfe16c1cc0327f8bdf24
Summary: This is a followup to facebook#6646. In that PR, for simplicity I just appended a comparison against the 0th restart key in case `BinarySeek()`'s binary search landed at index 0. As a result there were `2/(N+1) + log_2(N)` key comparisons. This PR does it differently. Now we expand the binary search range by one so it also covers the case where target is at or before the restart key at index 0. As a result, it involves `log_2(N+1)` key comparisons. Pull Request resolved: facebook#7068 Test Plan: ran readrandom with mostly default settings and counted key comparisons using `PerfContext`. before: `user_key_comparison_count = 28881965` after: `user_key_comparison_count = 27823245` setup command: ``` $ TEST_TMPDIR=/dev/shm/dbbench ./db_bench -benchmarks=fillrandom,compact -write_buffer_size=1048576 -target_file_size_base=1048576 -max_bytes_for_level_base=4194304 -max_background_jobs=12 -level_compaction_dynamic_level_bytes=true -num=10000000 ``` benchmark command: ``` $ TEST_TMPDIR=/dev/shm/dbbench/ ./db_bench -use_existing_db=true -benchmarks=readrandom -disable_auto_compactions=true -num=10000000 -compression_type=none -reads=1000000 -perf_level=3 ``` Reviewed By: anand1976 Differential Revision: D22357032 Pulled By: ajkr fbshipit-source-id: 8b01e9c1c2a4e9d02fc9dfe16c1cc0327f8bdf24
This saves up to two key comparisons in block seeks. The first key
comparison saved is a redundant key comparison against the restart key
where the linear scan starts. This comparison is saved in all cases
except when the found key is in the first restart interval. The
second key comparison saved is a redundant key comparison against the
restart key where the linear scan ends. This is only saved in cases
where all keys in the restart interval are less than the target
(probability roughly
1/restart_interval
).Test Plan:
ran readrandom with mostly default settings and counted key comparisons
using
PerfContext
.before:
user_key_comparison_count = 30370310
after:
user_key_comparison_count = 28881965
setup command:
benchmark command: