Skip to content

cds-ruc/RoBin

Repository files navigation

RoBin

Robin is a Robustness Benchmark for range indexes (especially for updatable learned indexes).

Robin is also an insect-eating bird that offers great benefits to agriculture.

RoBin Build

Notice

  1. We only modify the Face dataset to support experiments on ALEX. Because the Face dataset contains the numeric_limit<uint64_t>, ALEX uses this as a sential for easing implementation. Therefore, we shifted all fb keys by minus one as Face-1 dataset for testing ALEX only.
  2. We modify the LIPP's hyperparameter MAX_DEPTH to ensure it can successfully run all the test cases (otherwise it will crash due to its assertion at runtime).
  3. We modify the bulkload process of STX B+tree to ensure its node half filled (load factor = 0.5) after bulkloaading, which aligns its insertions and splits to show its performance robustness.
  4. Other parameters of all indexes are the same as their original implementations.
  5. All of our tested index implementations can be found in this repo. Each branch is corresponding to one index.
  6. We add profiling stats for art, btree, alex and lipp about the distribution of depth, comparison count of leaf node search, the model of root node and so on, with minor invasion.

Reproduce Step

Prepare

RoBin depends on the tbb, jemalloc and boost library. You can install them by the following command:

sudo apt update
sudo apt install -y libtbb-dev libjemalloc-dev libboost-dev

If the repository is not cloned with the --recursive option, you can run the following command to clone the submodule:

git submodule update --init --recursive

Download the dataset from remote and construct linear and fb-1:

cd datasets
bash download.sh
python3 gen_linear_fb-1.py

Build

rm -rf build
mkdir -p build
cd build
cmake .. -DCMAKE_BUILD_TYPE=Release -DCMAKE_EXPORT_COMPILE_COMMANDS=ON
make -j

Reproduce

Benchmark all the competitors via RoBin with the following command:

It may cost some time to finish.

export numanode=xxx # if you have multiple numa nodes, you must specify the numa node in case of the memory allocation from different numa nodes
bash reproduce.sh

Using the jupyter notebook to plot the performance results:

cd script
## open and run the following jupyter notebook to reproduce the figure in our paper
## result_plotter.ipynb

Profiling

Build with the flag "PROFILING":

rm -rf build
mkdir -p build
cd build
cmake .. -DCMAKE_BUILD_TYPE=Release -DCMAKE_EXPORT_COMPILE_COMMANDS=ON -DPROFILING=ON
make -j

Note that our code modifications for profiling have no impact on index performance when building without this flag for benchmark test.

Run profiling script:

bash run_profiling.sh

Using the jupyter notebook to plot the profiling results:

cd script
## open and run the following jupyter notebooks to reproduce the figure in our paper
## analysis_cmp.ipynb
## analysis_depth.ipynb
## analysis_overfit.ipynb

Run and Play

We also provide a script to run the RoBin with custom parameters. You can run the following command to see the help information:

python3 run.py --help

Reference

  1. We build this benchmark based on a well-designed benchmark GRE. The related paper is:

Wongkham, Chaichon, et al. "Are updatable learned indexes ready?." Proceedings of the VLDB Endowment 15.11 (2022): 3004-3017.