Skip to content

fast-pack/MaskedVByte

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

53 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MaskedVByte

Ubuntu

Fast, vectorized VByte decoding for 32‑bit integers in C, with optional differential (delta) coding.

  • Requires x86-64 with SSE4.1 (available on virtually all modern x64 CPUs)
  • C99 compatible

Build and test

make        # builds the library and the test binary
./unit      # runs a quick correctness test

CMake build (alternative)

mkdir -p build
cmake -S . -B build -DCMAKE_BUILD_TYPE=Release \
      -DMASKEDVBYTE_BUILD_TESTS=ON \
      -DMASKEDVBYTE_BUILD_EXAMPLES=ON
cmake --build build -j
ctest --test-dir build --output-on-failure   # optional

# run the example built by CMake
./build/example

Install with CMake (optional):

cmake --install build --prefix /usr/local

Build and run the example

make example
./example

You should see something like:

Compressed 5000 integers down to 5000 bytes.

Embedded example, explained

The example allocates input/output buffers, encodes a flat array of integers with classic VByte, then decodes it back with the masked (vectorized) decoder and verifies the sizes match.

#include <stdio.h>
#include <stdlib.h>
#include <assert.h>

#include "varintencode.h"
#include "varintdecode.h"

int main() {
            int N = 5000;
            uint32_t * datain = malloc(N * sizeof(uint32_t));
            uint8_t * compressedbuffer = malloc(N * sizeof(uint32_t));
            uint32_t * recovdata = malloc(N * sizeof(uint32_t));
            for (int k = 0; k < N; ++k)
                        datain[k] = 120; // constant value fits in one VByte
            size_t compsize = vbyte_encode(datain, N, compressedbuffer); // encoding
            // result is stored in 'compressedbuffer' using 'compsize' bytes
            size_t compsize2 = masked_vbyte_decode(compressedbuffer, recovdata, N); // fast decoding
            assert(compsize == compsize2); // sanity check
            free(datain);
            free(compressedbuffer);
            free(recovdata);
            printf("Compressed %d integers down to %d bytes.\n", N, (int)compsize);
            return 0;
}

What’s happening:

  • VByte uses a continuation bit; small values like 120 encode to a single byte, so 5000 values compress to 5000 bytes.
  • masked_vbyte_decode is a vectorized decoder using SSE4.1 for speed.
  • Differential coding variants are available when your data is sorted or has small gaps.

API at a glance

Headers are in include/.

  • Encoding - size_t vbyte_encode(const uint32_t* in, size_t length, uint8_t* bout); - size_t vbyte_encode_delta(const uint32_t* in, size_t length, uint8_t* bout, uint32_t prev);

  • Decoding - size_t masked_vbyte_decode(const uint8_t* in, uint32_t* out, uint64_t length); - size_t masked_vbyte_decode_delta(const uint8_t* in, uint32_t* out, uint64_t length, uint32_t prev); - size_t masked_vbyte_decode_fromcompressedsize(const uint8_t* in, uint32_t* out, size_t inputsize); - size_t masked_vbyte_decode_fromcompressedsize_delta(const uint8_t* in, uint32_t* out, size_t inputsize, uint32_t prev); - Random access helpers for delta streams: - uint32_t masked_vbyte_select_delta(const uint8_t *in, uint64_t length, uint32_t prev, size_t slot); - int masked_vbyte_search_delta(const uint8_t *in, uint64_t length, uint32_t prev, uint32_t key, uint32_t *presult);

Tips

  • Prefer delta coding when your sequence is sorted or has small differences; it often reduces the number of bytes per integer.
  • If you know the compressed byte length, use the *_fromcompressedsize functions to decode exactly that many bytes.

Use from your CMake project

After installation (see above):

find_package(maskedvbyte CONFIG REQUIRED)
target_link_libraries(your_target PRIVATE maskedvbyte::maskedvbyte)

Or as a subdirectory (vendored):

add_subdirectory(path/to/MaskedVByte)
target_link_libraries(your_target PRIVATE maskedvbyte::maskedvbyte)

Interesting applications

Reference

  • Daniel Lemire, Nathan Kurz, Christoph Rupp, Stream VByte: Faster Byte-Oriented Integer Compression, Information Processing Letters 130, February 2018, Pages 1-6 https://arxiv.org/abs/1709.08990
  • Jeff Plaisance, Nathan Kurz, Daniel Lemire, Vectorized VByte Decoding, International Symposium on Web Algorithms 2015, 2015. http://arxiv.org/abs/1503.07387

See also

License

See LICENSE for details.

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •