Skip to content

Online Resources for the Paper 'Quantifying TPC-H Choke Points and Their Optimizations'

Notifications You must be signed in to change notification settings

ZhengtongYan/tpch_paper

 
 

Repository files navigation

Quantifying TPC-H Choke Points and Their Optimizations

Welcome to the online resources for our paper "Quantifying TPC-H Choke Points and Their Optimizations". In this repository, we share instructions and the code needed for reproducing the results presented in the paper. Additionally, we provide the raw benchmark results as generated by the hyriseBenchmarkTPCH binary.

For any questions about setting up the environment, reproducing the benchmarks, or just to discuss the paper as such, please feel free to contact us.

Setting up Hyrise

Hyrise can be retrieved from Github. Our Step-by-Step Guide will take you through the steps needed to set up Hyrise, starting the console, or running the TPC-H benchmark.

The patch files in this repository are based on the paper/tpch tag. For better reproducibility, that tag will not be updated with more recent Hyrise developments. To use Hyrise beyond the experiments presented here, please use the current master branch.

To execute a benchmark as presented in the paper, first generate baseline results using

./hyriseBenchmarkTPCH -s 10 -o baseline.json

Next, apply the provided patch file using git apply patchfile.diff in the repositories root folder. After recompiling, re-run the benchmarks and store the results in a different file. You can then compare two runs using the ./scripts/compare_benchmarks.py script.

Section 1. Motivation

We crawled Google Scholar using this script from Strobel and Hofmann. Raw data is contained in the downloadable repository.

Section 3. Experimental Setup

  • The performance breakdown of Hyrise operators (Figure 2) can be obtained by running the benchmark with the --visualize parameter and calling the plot_performance_breakdown.py script.

  • For re-running the Hyrise TPC-H Comparison shown in Figure 3, execute hyriseBenchmarkTPCH. The average time for each benchmark query is given on the console.

  • Instructions for executing TPC-H on MonetDB can be found here. We used MonetDB 5 server 11.35.9, which was obtained using Ubuntu's package manager.

  • DuckDB was built from source (70c20f28f) using a patch to enable SF 10.

Section 4. Plan-Level Choke Points

Please find the mentioned diff files and the raw results in the respective folder or linked below.

4.1 Join Ordering

4.2 Predicate Placement and Ordering

4.3 Between Composition

4.4 Join-Dependent Predicate Duplication

4.5 Physical Locality

4.6 Correlated Columns

  • Baseline: same as 4.5 (3)
  • (1) Exploitation of correlation: unmodified Hyrise code

4.7 Flattening Subqueries

4.8 Semi Join Reduction

4.9 Subplan Reuse

Section 5. Logical Operator Choke Points

5.1 Dependent Group-By Keys

5.2 Large IN Clauses

About

Online Resources for the Paper 'Quantifying TPC-H Choke Points and Their Optimizations'

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published