Skip to content

[docdb] optimize rocksdb's delta encoding for YugabyteDB key format #10094

Closed
@kmuthukk

Description

The delta encoding scheme in RocksDB does simple prefix compression. This works well if when the common part of the two adjacent keys is at the prefix, and the non-common part is at the end.

The YugabyteDB key format is something like:

<hashcode>:<hash-key-column(s)>:<range-key-column(s)>:<column-id>:<hybrid-TS>:<uniq-write-id>

For a row with say N non-primary key columns... N such keys are written that only differ in the column-id (and perhaps the uniq-write-id). Also note that in cases where the column values themselves are small - such as integers or small text fields, the key portion can dominate the space usage.

A simple prefix-compression based delta encoding scheme is only able to save the bytes to the left of the <column-id>.. and therefore not as effective as a more "custom" delta encoding scheme that we can implement.

For example, if we are able to save 90% of space used for the keys instead of 70%, it might seem like a small incremental benefit but in the overall equation (especially for cases where the value portion is small) the space savings can be significant (2-3x even).

In one workload, we noticed that 220MB of an SSTable was for keys and 20MB was for values, and prefix compression was given an effective savings of 70%.

So logical space used with simple prefix compression based delta-encoding is:

(100-70%) of 220MB (for keys) + 20MB (for values) = 86MB

We can easily implement a more custom delta encoding scheme that can give nearly 90% savings on keys. So the math will then be:

(100-90)% of 220MB (for keys) + 20MB (for values) = 42MB

Nearly 2x wins!

The nice thing is that the change can easily be done in upgrade safe manner where newer SSTable files written by newer versions of the software start writing data in this new delta encoding format... while supporting both formats on the read-path. A metadata bit in the SSTable file can indicate which delta encoding format is in use.

Metadata

Assignees

Labels

area/docdbYugabyteDB core features

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions