You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It is a common case that users are storing nearly the same value in one column. For example, in TPC-H, we have a CHAR(1) column. There must be duplicated data in this column if we have 1000k rows, and therefore we can use running-length encoding to do some compression.
Guide-level Introduction
RLE encoding is a compression scheme instead of a typical encoding. It can be applied on any column. For example, given a int column:
When finishing a block, RLEBlockBuilder simply append the RLE count data to the encoded block.
Column
L0 rowsets should not use RLE encoding. We can use distinct value statistics to decide whether to use RLE encoding to the compacted new rowsets.
We should modify factories in column correspondingly:
pub(super)enumBlockIteratorImpl<T:PrimitiveFixedWidthEncode>{Plain(PlainPrimitiveBlockIterator<T>),PlainNullable(PlainPrimitiveNullableBlockIterator<T>),/* new */RLEPlainNullable(RLEBlockIterator<PlainPrimitiveNullableBlockIterator<T>>)}
It is a common case that users are storing nearly the same value in one column. For example, in TPC-H, we have a
CHAR(1)
column. There must be duplicated data in this column if we have 1000k rows, and therefore we can use running-length encoding to do some compression.Guide-level Introduction
RLE encoding is a compression scheme instead of a typical encoding. It can be applied on any column. For example, given a int column:
RLE encoding will check if the current value is the same as the previous one, and generate a RLE map:
Implementation-level Introduction
Builder and Iterator
Contrary to primitive iterators, RLE block builder and iterator are simply a wrapper on primitive iterators. It will look like follows:
When finishing a block, RLEBlockBuilder simply append the RLE count data to the encoded block.
Column
L0 rowsets should not use RLE encoding. We can use distinct value statistics to decide whether to use RLE encoding to the compacted new rowsets.
We should modify factories in column correspondingly:
And in proto:
The text was updated successfully, but these errors were encountered: