Welcome to the online resources for our paper "Quantifying TPC-H Choke Points and Their Optimizations". In this repository, we share instructions and the code needed for reproducing the results presented in the paper. Additionally, we provide the raw benchmark results as generated by the hyriseBenchmarkTPCH binary.
For any questions about setting up the environment, reproducing the benchmarks, or just to discuss the paper as such, please feel free to contact us.
Hyrise can be retrieved from Github. Our Step-by-Step Guide will take you through the steps needed to set up Hyrise, starting the console, or running the TPC-H benchmark.
The patch files in this repository are based on the paper/tpch
tag. For better reproducibility, that tag will not be updated with more recent Hyrise developments. To use Hyrise beyond the experiments presented here, please use the current master
branch.
To execute a benchmark as presented in the paper, first generate baseline results using
./hyriseBenchmarkTPCH -s 10 -o baseline.json
Next, apply the provided patch file using git apply patchfile.diff
in the repositories root folder. After recompiling, re-run the benchmarks and store the results in a different file. You can then compare two runs using the ./scripts/compare_benchmarks.py
script.
We crawled Google Scholar using this script from Strobel and Hofmann. Raw data is contained in the downloadable repository.
-
The performance breakdown of Hyrise operators (Figure 2) can be obtained by running the benchmark with the
--visualize
parameter and calling theplot_performance_breakdown.py
script. -
For re-running the Hyrise TPC-H Comparison shown in Figure 3, execute hyriseBenchmarkTPCH. The average time for each benchmark query is given on the console.
-
Instructions for executing TPC-H on MonetDB can be found here. We used MonetDB 5 server 11.35.9, which was obtained using Ubuntu's package manager.
-
DuckDB was built from source (70c20f28f) using a patch to enable SF 10.
Please find the mentioned diff files and the raw results in the respective folder or linked below.
- Baseline: original_join_order.diff (raw data not linked here, see folder)
- (1) DPccp: unmodified Hyrise code
- (2) Greedy: greedy.diff
- Baseline: no_pushdown_no_ordering.diff
- (1) Pushdown without ordering: pushdown_no_ordering.diff
- (2) Pushdown and ordering: pushdown_ordering.diff
- Baseline: no_between_composition.diff
- (1) Enabled composition: unmodified Hyrise code
- Baseline: no_duplication.diff
- (1) Enabled duplication: unmodified Hyrise code
- Baseline: unsorted.diff
- (1) Shuffled lineitem and orders tables: shuffled.diff
- (2) Physical locality without DBMS optimizations: sorted_no_pruning.diff
- (3) Physical locality with DBMS optimizations but without usage of correlations: sorted_pruning_no_correlation.diff
- Baseline: same as 4.5 (3)
- (1) Exploitation of correlation: unmodified Hyrise code
- Baseline: no_flattening.diff
- (1) With subquery flattening: unmodified Hyrise code
- Baseline: no_reduction.diff
- (1) With indiscriminate reduction: indiscriminate_reduction.diff
- (2) With cardinality-based reduction: unmodified Hyrise code
- Baseline: no_reuse.diff
- (1) With subplan reuse and enabled column pruning: unmodified Hyrise code
- (2) With subplan reuse and disabled column pruning: no_column_pruning.diff
- Baseline: no_dependent_groupby_detection.diff
- Automatic group-by detection: unmodified Hyrise code
- Manual group-by detection for foreign keys: manual_groupby_detection.diff
- Baseline: no_in_reformulation.diff
- Reformulation of IN clauses: unmodified Hyrise code