Name		Name	Last commit message	Last commit date
parent directory ..
sql		sql
.gitignore		.gitignore
README.md		README.md
analyze_microbenchmark.py		analyze_microbenchmark.py
generate_tpch.sh		generate_tpch.sh
plot.py		plot.py
plot_inkfuse.py		plot_inkfuse.py
plot_split_y.py		plot_split_y.py
postprocess_umbra.py		postprocess_umbra.py
reproduce_all.sh		reproduce_all.sh
reproduce_duckdb.py		reproduce_duckdb.py
reproduce_hyperapi.py		reproduce_hyperapi.py
reproduce_inkfuse.sh		reproduce_inkfuse.sh
reproduce_umbra.sh		reproduce_umbra.sh
requirements.txt		requirements.txt

README.md

Benchmarking Inkfuse

This directory contains the required scripts to benchmark InkFuse on TPC-H, as well as run it against other systems.

As InkFuse does not support all SQL constructs for the raw TPC-H queries, the directory contains a simplified schema and slightly simplified queries that match the physical plans run by InkFuse in the sql directory.

Reproducing Everything

To reproduce the experiments in the paper, simply run ./reproduce_all.sh from this directory. This creates:

results_<configuration>.csv results files for the different engines and scale factors.
plots/main.pdf containing a result figure.

Note: sometimes the open-source tpch-dbgen tool sets up files with the wrong permissions and things aren't getting cleaned up properly. In that case reproduce_all.sh might ask you whether you want to overwrite some files. In that case, always just click yes.

Reproducing Individual Systems

There are individual reproduction-scripts for all systems. These usually perform the following steps:

Download the TPC-H dbgen tool and create correctly formatted data at the target scale factor
Load the data into the target system
Run the queries and create result files

InkFuse

To measure InkFuse at a given scale factor <sf>, simply run:

# Run at the given scale factor
./reproduce_inkfuse.sh <sf>

Note: you need to have clang-14 and clang++-14 installed for this script to run! The script compiles the inkfuse_bench target from the source tree in the parent directory. Note that InkFuse has no SQL interface at the moment, but only supports hard-coded physical plans. The inkfuse_bench binary runs all supported TPC-H queries and creates result files containing the measurements.

DuckDB

To measure DuckDB at a given scale factor <sf>, simply run:

# Install requirements
pip3 install -r requirements.txt
# Run at the given scale factor
./reproduce_duckdb.py <sf>

Umbra

To measure Umbra at a given scale factor <sf>, simply run:

# Simply run the reproduction script 
./reproduce_umrba.sh <sf>

This set up a umbra_data directory into which all databases, binaries, etc. go. Huge thanks to the TUM database group for allowing us to share these reproduction scripts.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

reproduce

reproduce

README.md

Benchmarking Inkfuse

Reproducing Everything

Reproducing Individual Systems

InkFuse

DuckDB

Umbra

Files

reproduce

Directory actions

More options

Directory actions

More options

Latest commit

History

reproduce

Folders and files

parent directory

README.md

Benchmarking Inkfuse

Reproducing Everything

Reproducing Individual Systems

InkFuse

DuckDB

Umbra