[docdb] optimize rocksdb's delta encoding for YugabyteDB key format #10094
Description
The delta encoding scheme in RocksDB does simple prefix compression. This works well if when the common part of the two adjacent keys is at the prefix, and the non-common part is at the end.
The YugabyteDB key format is something like:
<hashcode>:<hash-key-column(s)>:<range-key-column(s)>:<column-id>:<hybrid-TS>:<uniq-write-id>
For a row with say N non-primary key columns... N such keys are written that only differ in the column-id (and perhaps the uniq-write-id). Also note that in cases where the column values themselves are small - such as integers or small text fields, the key portion can dominate the space usage.
A simple prefix-compression based delta encoding scheme is only able to save the bytes to the left of the <column-id>
.. and therefore not as effective as a more "custom" delta encoding scheme that we can implement.
For example, if we are able to save 90% of space used for the keys instead of 70%, it might seem like a small incremental benefit but in the overall equation (especially for cases where the value portion is small) the space savings can be significant (2-3x even).
In one workload, we noticed that 220MB of an SSTable was for keys and 20MB was for values, and prefix compression was given an effective savings of 70%.
So logical space used with simple prefix compression based delta-encoding is:
(100-70%) of 220MB (for keys) + 20MB (for values) = 86MB
We can easily implement a more custom delta encoding scheme that can give nearly 90% savings on keys. So the math will then be:
(100-90)% of 220MB (for keys) + 20MB (for values) = 42MB
Nearly 2x wins!
The nice thing is that the change can easily be done in upgrade safe manner where newer SSTable files written by newer versions of the software start writing data in this new delta encoding format... while supporting both formats on the read-path. A metadata bit in the SSTable file can indicate which delta encoding format is in use.