-
Notifications
You must be signed in to change notification settings - Fork 6.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Index value delta encoding #3983
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@maysamyabandeh has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
@maysamyabandeh has updated the pull request. View: changes, changes since last import |
table/block.cc
Outdated
@@ -283,9 +291,33 @@ bool BlockIter::ParseNextKey() { | |||
|
|||
value_ = Slice(p + non_shared, value_length); | |||
while (restart_index_ + 1 < num_restarts_ && | |||
GetRestartPoint(restart_index_ + 1) < current_) { | |||
GetRestartPoint(restart_index_ + 1) <= current_) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was a bug in the existing code. The bug did not do any harm though and the restart point would be eventually be set correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I read the code a little bit. Maybe I'm wrong but before your new code, restart_index_
seems to only be useful when calling Prev(). In that case, in that case GetRestartPoint(restart_index_) == current_
is not very useful, as the current entry is not interesting, so we are going restart_index_--
for that anyway, so we are doing an unnecessary ++ and --, as well as an extra GetRestartPoint()
in Prev().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First I would say that existing code is confusing as restart_index_ does not reflect what its definition says, and would make it likely to create bugs. But if the performance overhead of the one more GetRestartPoint is the concern I can revert it back to previous implementation and does the additional call only if value_delta_encoded_ is enabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
update: this is reverted now.
Switching from format 3 to 4 reduces the index size form 345949 to 197100 (with index_block_restart_interval=16) |
@maysamyabandeh has updated the pull request. View: changes, changes since last import |
@maysamyabandeh has updated the pull request. View: changes, changes since last import |
table/block.cc
Outdated
assert(decode_s.ok()); | ||
value_ = Slice(p + non_shared, updated_value.data() - value_.data()); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we move it to another function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure.
table/block.h
Outdated
// the first value in that restart interval. | ||
bool value_delta_encoded_; | ||
mutable std::string value_buf_; | ||
BlockHandle decoded_value_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having so many index block specific logic in BlockIter can make the code harder to understand. Is there a way to generialize it? Is it possible to define an interface like ValueDeltaEncoder (by default null) and call the interface to delta decode the value? Is it possible?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could be static method of helper class as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about make another IndexBlockIter, and move all the index related logic to there, and leave those data block only logic behind? The common binary search can be done with a helper class or a subclass. I do worry that we make the already very complicated BlockIter to be even more complicated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
update: after the rebase, the logic is moved to IndexBlockIter.
table/block.cc
Outdated
uint32_t* value_length, | ||
const bool value_delta_encoded) { | ||
// We need 2 btes for shared and non_shared size. We also need one more byte | ||
// either for value size or the actual value in case of value delta encoding. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason we don't always do value delta encoding inside one restart block?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any delta encoding needs one initial full encoding. This implementation uses the first entry in the restart block to have the full encoding. What is the alternative that you would recommend?
table/block.cc
Outdated
@@ -283,9 +291,33 @@ bool BlockIter::ParseNextKey() { | |||
|
|||
value_ = Slice(p + non_shared, value_length); | |||
while (restart_index_ + 1 < num_restarts_ && | |||
GetRestartPoint(restart_index_ + 1) < current_) { | |||
GetRestartPoint(restart_index_ + 1) <= current_) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I read the code a little bit. Maybe I'm wrong but before your new code, restart_index_
seems to only be useful when calling Prev(). In that case, in that case GetRestartPoint(restart_index_) == current_
is not very useful, as the current entry is not interesting, so we are going restart_index_--
for that anyway, so we are doing an unnecessary ++ and --, as well as an extra GetRestartPoint()
in Prev().
table/block.cc
Outdated
value_ = Slice(p + non_shared, updated_value.data() - value_.data()); | ||
} else { | ||
const size_t kHandleMaxSize = 8; | ||
auto v = Slice(p + non_shared, kHandleMaxSize); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do we make sure Slice is within the bytes of the block?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The slice would exceed the block limits only in the case of corruption. A corruption could be detected at any point during the reading. Without delta encoding we had this opportunity to detect such corruption when reading the value. With delta encoding we would rely either on past checks (checksum) or on future checks on the code to detect a value that does not make sense.
table/block.cc
Outdated
const size_t kHandleMaxSize = 8; | ||
auto v = Slice(p + non_shared, kHandleMaxSize); | ||
auto next_value_base = | ||
decoded_value_.offset() + decoded_value_.size() + kBlockTrailerSize; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In Google C++ Style, use auto in cases like this is not encouraged: https://google.github.io/styleguide/cppguide.html#auto "Sometimes code is clearer when types are manifest, especially when a variable's initialization depends on things that were declared far away". In this case, the type of next_value_base
is not very clear to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure.
table/block.cc
Outdated
BlockHandle handle; | ||
Status decode_s = handle.DecodeSizeFrom(next_value_base, &v); | ||
decoded_value_ = handle; | ||
auto& updated_value = v; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This auto
also makes the code hard for me to understand. I suggest we make it explicit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure..
table/block.cc
Outdated
auto& updated_value = v; | ||
assert(decode_s.ok()); | ||
value_ = Slice(p + non_shared, updated_value.data() - value_.data()); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you document the format in the comment? I haven't manged to understand it yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I forgot to do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is now documented on IndexBlockIter::DecodeCurrentValue
@maysamyabandeh has updated the pull request. View: changes, changes since last import |
Is there a succinct description of the new format anywhere? Are you still encoding the payload (block handle) length? If so, it could be avoided because the following varints are uniquely decodable. |
table/block.cc
Outdated
return true; | ||
} | ||
} | ||
|
||
// The fomrat: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo
table/block.cc
Outdated
// restart_point 1: k, v (offset, size), k, v (size), ..., k, v (size) | ||
// ... | ||
// restart_point n-1: k, v (offset, size), k, v (size), ..., k, v (size) | ||
// where, k is key, v is value, and its encoding is in pranthesis. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also here "pranthesis"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ot yeah you found the description, and as it says:
The format of each key is (shared_size, non_shared_size, shared, non_shared)
so the block handle size (i.e., the value size) is not encoded in the new format.
table/block.cc
Outdated
return true; | ||
} | ||
} | ||
|
||
// The fomrat: | ||
// restart_point 0: k, v (offset, size), k, v (size), ..., k, v (size) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we consider:
k, v (offset, size1), k, v (size2-size1), ..., k, v (size3-size2)
as what @ot used to proposed?
I also hope
k, v (offset, size1), k, v [padding_size](size2-size1), ..., k, v [padding_size](size3-size2)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the main factor of whether to take this approach is that whether it is common to have block size difference to within 64 bytes. Sounds like very unlikely though. So maybe size is good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you're referring to https://www.prod.facebook.com/groups/rocksdb.dev/permalink/815533521878497/, I did not suggest that, I was just proposing to use offset, size, size, size, ... like here, and if we have no other ways to distinguish restart points from delta handles, use negative size (with zigzag encoding) to use the sign bit to signal that.
Other than that, I also still believe that we should try to drop the payload size byte here (is it done in this PR?) and switch to sub-varint for prefix coding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@siying it is an interesting idea. since size_(i+1) - size_i could be positive or negative we need to have most of them between -127 to 127 to significantly benefit from this. With 4k blocks, the compressed block will be 2k and the [-127, 127] is likely to be true for the difference between two index sizes.
I think we can leave padding_size for a future work.
@ot payload size byte is dropped in the new format.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@siying let me correct myself. To benefit from this optimization the changes in the block size should be within [-63, 64] since in varint encoding numbers larger than 127 will take the 2nd byte.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or we use 0xFF to indicate a fallback to full size.
@maysamyabandeh has updated the pull request. Re-import the pull request |
1 similar comment
@maysamyabandeh has updated the pull request. Re-import the pull request |
Verified that the critical functions are properly inlined:
|
@maysamyabandeh have you also experimented with the prefix encoding I suggested in the post? |
@maysamyabandeh has updated the pull request. Re-import the pull request |
@ot I guess you refer to this by "prefix encoding":
The sample sst files that I have been using as reference almost always encodes shared size in one byte. For that not to be the case the key prefix should be > 127 bytes which was not the case in my reference db. I do agree that this is a good idea, but not of immediate impact. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@maysamyabandeh has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
@maysamyabandeh I'm talking about using 1 byte for shared+nonshared instead of 2. With the rear coding optimization, is very likely that both will be < 15 and can be encoded in 2 nibbles. |
@ot oh yeah I was discussing this idea with @siying. Since there will be tradeoff between the memory space and decoding cpu usage, I am not sure if this always would be a win. We would need proper benchmarking to see if it is worth it. The current decoding in 1-byte varint is very simple and efficient. |
@maysamyabandeh why do you think this would be slower than varint? The encoding I'm suggesting is used in LZ4 which is quite fast :) |
@maysamyabandeh I don't buy the argument that it is less CPU efficient. It is simple:
|
It's better to use
Also note that
(which is common in some databases, think of long column names) then |
@maysamyabandeh has updated the pull request. Re-import the pull request |
@ot "Also note that shared_len should really be len - shared_len", do you mean the len from the previous one? |
@maysamyabandeh has updated the pull request. Re-import the pull request |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maysamyabandeh has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
util/coding.h
Outdated
inline const char* GetVarsignedint64Ptr(const char* p, const char* limit, | ||
int64_t* value) { | ||
uint64_t u = 0; | ||
auto ret = GetVarint64Ptr(p, limit, &u); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: I still feel it's better to name the type explicitly here.
util/coding.h
Outdated
@@ -254,6 +261,15 @@ inline void PutVarint64(std::string* dst, uint64_t v) { | |||
dst->append(buf, static_cast<size_t>(ptr - buf)); | |||
} | |||
|
|||
inline void PutVarsignedint64(std::string* dst, int64_t v) { | |||
char buf[10]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use kMaxVarint64Length instead?
table/block.h
Outdated
// offset of delta encoded BlockHandles is computed by adding the size of | ||
// previous delta encoded values in the same restart interval to the offset of | ||
// the first value in that restart interval. | ||
bool value_delta_encoded_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: Can we pad value_delta_encoded_
together with key_includes_seq_
? Now it takes 8 bytes.
PutVarint32Varint32Varint32(&buffer_, static_cast<uint32_t>(shared), | ||
static_cast<uint32_t>(non_shared), | ||
static_cast<uint32_t>(value.size())); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's OK for now but maybe we should later refactor so that index and data block builder logic is separated.
table/block_test.cc
Outdated
@@ -68,6 +68,28 @@ void GenerateRandomKVs(std::vector<std::string> *keys, | |||
} | |||
} | |||
|
|||
void GenerateRandomKVs(std::vector<std::string> *keys, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
V is actually a block handle. Can you rename the function so that it is clearer?
@@ -109,12 +119,14 @@ PartitionedFilterBlockReader::PartitionedFilterBlockReader( | |||
const SliceTransform* prefix_extractor, bool _whole_key_filtering, | |||
BlockContents&& contents, FilterBitsReader* /*filter_bits_reader*/, | |||
Statistics* stats, const InternalKeyComparator comparator, | |||
const BlockBasedTable* table, const bool index_key_includes_seq) | |||
const BlockBasedTable* table, const bool index_key_includes_seq, | |||
const bool index_value_is_full) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I saw you renamed it to "index_value_is_delta_encoded" in many places. Why not keep it consistent?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was the case before and had caused programming bugs. The new name is consistent with index_key_includes_seq, in the way that the default of both of them is true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had a hard time understanding what does "index value full" means. You meant it was not deeply encoded, right. I didn't think that way the first time I saw it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any suggestion for rename is welcomed (without changing the default).
@maysamyabandeh has updated the pull request. Re-import the pull request |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maysamyabandeh has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
HISTORY.md
Outdated
@@ -2,6 +2,7 @@ | |||
## Unreleased | |||
### Public API Change | |||
### New Features | |||
* Changes the format of index blocks by delta encoding the index values, which are the block handles. This saves the encoding of BlockHandle::offset of the non-head index entries in each restart interval. The feature is backward compatbile but not forward compatible. It is disabled by default unless format_version 4 or above is used. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: typo compatbile
table/block.cc
Outdated
// restart_point n-1: k, v (off,sz), k, v (delta-sz), ..., k, v (delta-sz) | ||
// where, k is key, v is value, and its encoding is in parenthesis. | ||
// The format of each key is (shared_size, non_shared_size, shared, non_shared) | ||
// The format of each value, i.e., block hanlde, is (offset,size) whenever the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: typo hanlde
, also add a space in all tuples after the comma? (offset, size)
table/block_builder.cc
Outdated
static_cast<uint32_t>(non_shared), | ||
static_cast<uint32_t>(value.size())); | ||
if (use_value_delta_encoding_) { | ||
// Add "<shared><non_shared><value_size>" to buffer_ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is incorrect, there is no <value_size>
table/block_test.cc
Outdated
|
||
// generate different prefix | ||
for (int i = from; i < from + len; i += step) { | ||
// generating keys that shares the prefix |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why one comment is present tense, the other -ing?
util/coding.h
Outdated
// Shift the absolute number right for one bit. Then use the least significant | ||
// bit to indicate the sign. | ||
char* ptr = | ||
v >= 0 ? EncodeVarint64(buf, v * 2) : EncodeVarint64(buf, v * -1 * 2 + 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't EncodeVarint64(buf, v >= 0 ? uint64_t(v) * 2 : -uint64_t(v) * 2 + 1 )
be simpler?
Also be careful with the casts and their order, you're doing signed overflow here (note that -INT_MIN
overflows, so you have to negate after switching to unsigned)
Also, have you considered using the branch-free routines used in Thrift? They're probably much faster if the sign is unpredictable:
@maysamyabandeh has updated the pull request. Re-import the pull request |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maysamyabandeh has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
@maysamyabandeh has updated the pull request. Re-import the pull request |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maysamyabandeh is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
The previous report about the cpu regression with this PR was most probably an experiment error as I could not reproduce it.
|
WHY THIS PULL ACCEPT ? if need InternalIterator with other value type
then will not get such a big diff and this is |
@mm304321141 as what I mentioned in a WeChat thread, we don't treat SST and memtable interface as publish interface. They were under include/rocksdb, just because we were lazy to resolve some shared_ptr class defining requirement. They can't be implemented just depending on public API anyway. We haven't and won't promise any API compatibility to those memtable and SST file plug-in, including anything related to InternalIterator ---- the name of the class indicates what it is, an internal implementation. Although I don't see it breaks anything, it is possible that, we have a better way to reorganize the code than what is done in this PR. If you have a better suggestion, you can submit a pull request, and we can discuss from there. We can merge it if it is better than the way it is. By the way, we in the same community, should respect each other and trust each other to have tried the best to make the best judgements. I accepted the pull request, and fortunately I don't feel offended by the way you challenge me in all capital terms and three question marks. If it were another contributor, the way you wrote your comment might be totally counter-productive for you to solve your problem. |
Summary: With #3983 the size of IndexBlockIter was increased. This had resulted in a regression on P50 latencies in one of our benchmarks. The patch reduces IndexBlockIter size be eliminating active_comparator_ field from the class. Pull Request resolved: #4358 Differential Revision: D9781737 Pulled By: maysamyabandeh fbshipit-source-id: 71e2b28d90ff0813db9e04b737ae73e185583c52
Summary: With #3983 the size of IndexBlockIter was increased. This had resulted in a regression on P50 latencies in one of our benchmarks. The patch reduces IndexBlockIter size be eliminating active_comparator_ field from the class. Pull Request resolved: #4358 Differential Revision: D9781737 Pulled By: maysamyabandeh fbshipit-source-id: 71e2b28d90ff0813db9e04b737ae73e185583c52
Summary: Given that index value is a BlockHandle, which is basically an <offset, size> pair we can apply delta encoding on the values. The first value at each index restart interval encoded the full BlockHandle but the rest encode only the size. Refer to IndexBlockIter::DecodeCurrentValue for the detail of the encoding. This reduces the index size which helps using the block cache more efficiently. The feature is enabled with using format_version 4. The feature comes with a bit of cpu overhead which should be paid back by the higher cache hits due to smaller index block size. Results with sysbench read-only using 4k blocks and using 16 index restart interval: Format 2: 19585 rocksdb read-only range=100 Format 3: 19569 rocksdb read-only range=100 Format 4: 19352 rocksdb read-only range=100 Pull Request resolved: facebook#3983 Differential Revision: D8361343 Pulled By: maysamyabandeh fbshipit-source-id: f882ee082322acac32b0072e2bdbb0b5f854e651
Summary: Fix the compile error in "make unity_test" caused by facebook#3983. Pull Request resolved: facebook#4257 Differential Revision: D9271740 Pulled By: maysamyabandeh fbshipit-source-id: 94e56d1675bf8bdc0e94439467eb4f40dd107517
Summary: With facebook#3983 the size of IndexBlockIter was increased. This had resulted in a regression on P50 latencies in one of our benchmarks. The patch reduces IndexBlockIter size be eliminating active_comparator_ field from the class. Pull Request resolved: facebook#4358 Differential Revision: D9781737 Pulled By: maysamyabandeh fbshipit-source-id: 71e2b28d90ff0813db9e04b737ae73e185583c52
buffer_.append(delta_value->data(), delta_value->size()); | ||
} else { | ||
buffer_.append(value.data(), value.size()); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what if shared == 0 && use_value_delta_encoding_ == true?
The 'value.size()' was not written into buffer as logic from line 135 to line 144.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The inline doc is indeed confusing. Thanks for bringing it up. When use_value_delta_encoding_ is true it makes two changes in the encoding: the value size is no longer encoded, the value itself is fully encoded only if shared=0 (and otherwise a delta of it is encoded).
We should update the inline docs to clarify that.
Given that index value is a BlockHandle, which is basically an <offset, size> pair we can apply delta encoding on the values. The first value at each index restart interval encoded the full BlockHandle but the rest encode only the size. Refer to IndexBlockIter::DecodeCurrentValue for the detail of the encoding. This reduces the index size which helps using the block cache more efficiently. The feature is enabled with using format_version 4.