Skip to content

vteromero/integer-compression-benchmarks

Repository files navigation

This is a repository to benchmark integer compression algorithms with sorted lists of integers.

Data sets

These are the data sets that have been tested so far:

  • ts.txt: a text file with a large list of timestamps. It can be downloaded from here.

  • gov2.sorted: a binary file containing a sequence of sorted lists of 32-bit integers. This file is part of the "Document identifier data set" created by D. Lemire. It can be downloaded from here.

Results

The following results have been obtained on a laptop Ubuntu Desktop 21.10 with a Core i7-6700HQ CPU @ 2.60GHz x 8.

  • ts.txt:
Algorithm Encoded Size Compression Ratio Encoding Speed Decoding Speed
VTEnc/1 (*) 21,679 26,622 102.28 G/s 7.32 G/s
Delta+FastPFor256 1,179,312 489.4 1.45 G/s 3.61 G/s
Delta+FastPFor128 2,306,544 250.2 1.58 G/s 3.88 G/s
Delta+BinaryPacking 4,552,280 126.8 6.76 G/s 4.31 G/s
Delta+VariableByte 144,285,504 4.0 3.89 G/s 3.43 G/s
Delta+VarIntGB 180,356,880 3.2 5.05 G/s 6.73 G/s
Copy 577,141,992 1.0 11.83 G/s -
  • gov2.sorted:
Algorithm Encoded Size Compression Ratio Encoding Speed Decoding Speed
VTEnc/1 2,885,170,285 8.29 154.76 M/s 210.99 M/s
VTEnc/2 2,930,842,147 8.16 213.90 M/s 282.39 M/s
VTEnc/4 3,144,764,347 7.61 306.40 M/s 411.75 M/s
VTEnc/8 3,482,164,190 6.87 443.06 M/s 594.89 M/s
Delta+FastPFor128 3,849,161,656 6.21 615.66 M/s 615.64 M/s
Delta+FastPFor256 3,899,341,376 6.13 621.15 M/s 621.17 M/s
VTEnc/16 3,922,307,294 6.10 639.35 M/s 829.11 M/s
Delta+BinaryPacking 4,329,919,808 5.52 2.30 G/s 2.33 G/s
VTEnc/32 4,447,792,753 5.38 901.11 M/s 1,124.46 M/s
VTEnc/64 5,044,828,934 4.74 1.15 G/s 1.42 G/s
VTEnc/128 5,719,368,620 4.18 1.42 G/s 1.68 G/s
VTEnc/256 6,561,323,112 3.65 1.60 G/s 1.83 G/s
Delta+VariableByte 6,572,084,696 3.64 1.62 G/s 1.52 G/s
Delta+VarIntGB 7,923,819,720 3.01 2.03 G/s 2.97 G/s
Copy 23,918,860,964 1.00 5.10 G/s -

(*) Number alongside VTEnc indicates value of min_cluster_length encoding parameter.

Included libraries

  • benchmark: benchmarking library created by Google.
  • SIMDCompressionAndIntersection: C++ library to encode/decode sorted lists of integers. It provides a series of integer algorithms optimised to work with sorted lists.
  • VTEnc: C library that implements VTEnc algorithm.

Building

To build this project, clone it in this way:

$ git clone --recurse-submodules https://github.com/vteromero/integer-compression-benchmarks

Then, build the included libraries. It's highly recommended to follow the build instructions for each library in their respective repositories. However, for convenience, here is a quick list of instructions:

Build benchmark

# Benchmark requires Google Test as a dependency. Add the source tree as a subdirectory.
$ git clone https://github.com/google/googletest.git benchmark/googletest
# Go to the library root directory
$ cd benchmark
# Make a build directory to place the build output.
$ mkdir build && cd build
# Generate a Makefile with cmake.
# Use cmake -G <generator> to generate a different file type.
$ cmake -DCMAKE_BUILD_TYPE=Release ../
# Build the library.
# Use make -j<number_of_parallel_jobs> to speed up the build process, e.g. make -j8 .
$ make

Build SIMDCompressionAndIntersection

$ cd SIMDCompressionAndIntersection
$ make

Build VTEnc

$ cd VTEnc
$ make

Once the included libraries have been built, run make in the root directory. That will generate the executable intbench.

Running

You can invoke intbench in a variety of ways. Here are some examples:

# '--data-dir' option is required and indicates the path to the directory that
# contains the data sets.
$ ./intbench --data-dir=/home/user/data

# With '--benchmark_filter' you can choose which benchmarks to run.
$ ./intbench --data-dir=/home/user/data --benchmark_filter=TimestampsDataSet

# '--benchmark_list_tests' lists all the available benchmarks.
$ ./intbench --data-dir=/home/user/data --benchmark_list_tests

# Use '--help' to get more information about the available options.
$ ./intbench --data-dir=/home/user/data --help

License

This code is licensed under MIT license.

About

Integer compression benchmarks for sorted lists

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published