Robin is a Robustness Benchmark for range indexes (especially for updatable learned indexes).
Robin is also an insect-eating bird that offers great benefits to agriculture.
- We only modify the Face dataset to support experiments on ALEX. Because the Face dataset contains the numeric_limit<uint64_t>, ALEX uses this as a sential for easing implementation. Therefore, we shifted all fb keys by minus one as Face-1 dataset for testing ALEX only.
- We modify the LIPP's hyperparameter MAX_DEPTH to ensure it can successfully run all the test cases (otherwise it will crash due to its assertion at runtime).
- We modify the bulkload process of STX B+tree to ensure its node half filled (load factor = 0.5) after bulkloaading, which aligns its insertions and splits to show its performance robustness.
- Other parameters of all indexes are the same as their original implementations.
- All of our tested index implementations can be found in this repo. Each branch is corresponding to one index.
- We add profiling stats for art, btree, alex and lipp about the distribution of depth, comparison count of leaf node search, the model of root node and so on, with minor invasion.
RoBin depends on the tbb, jemalloc and boost library. You can install them by the following command:
sudo apt update
sudo apt install -y libtbb-dev libjemalloc-dev libboost-dev
If the repository is not cloned with the --recursive
option, you can run the following command to clone the submodule:
git submodule update --init --recursive
Download the dataset from remote and construct linear and fb-1:
cd datasets
bash download.sh
python3 gen_linear_fb-1.py
rm -rf build
mkdir -p build
cd build
cmake .. -DCMAKE_BUILD_TYPE=Release -DCMAKE_EXPORT_COMPILE_COMMANDS=ON
make -j
Benchmark all the competitors via RoBin with the following command:
It may cost some time to finish.
export numanode=xxx # if you have multiple numa nodes, you must specify the numa node in case of the memory allocation from different numa nodes
bash reproduce.sh
Using the jupyter notebook to plot the performance results:
cd script
## open and run the following jupyter notebook to reproduce the figure in our paper
## result_plotter.ipynb
Build with the flag "PROFILING":
rm -rf build
mkdir -p build
cd build
cmake .. -DCMAKE_BUILD_TYPE=Release -DCMAKE_EXPORT_COMPILE_COMMANDS=ON -DPROFILING=ON
make -j
Note that our code modifications for profiling have no impact on index performance when building without this flag for benchmark test.
Run profiling script:
bash run_profiling.sh
Using the jupyter notebook to plot the profiling results:
cd script
## open and run the following jupyter notebooks to reproduce the figure in our paper
## analysis_cmp.ipynb
## analysis_depth.ipynb
## analysis_overfit.ipynb
We also provide a script to run the RoBin with custom parameters. You can run the following command to see the help information:
python3 run.py --help
- We build this benchmark based on a well-designed benchmark GRE. The related paper is:
Wongkham, Chaichon, et al. "Are updatable learned indexes ready?." Proceedings of the VLDB Endowment 15.11 (2022): 3004-3017.