Skip to content

Latest commit

 

History

History
69 lines (49 loc) · 1.25 KB

README.md

File metadata and controls

69 lines (49 loc) · 1.25 KB

ClickBench: DataFusion / DuckDB comparision scripts

This benchmark compares DataFusion to DuckDB performance with the ClickBench queries aganst the unmodified ClickBench parquet files.

Results

Result Chart

Versions

  • DataFusion 27.0.0
  • DataFusion 28.0.0
  • DuckDB 0.8.1

Scenarios

  • Single parquet file (hits.parquet)

Download Data:

bash setup.sh

Install DataFusion-CLI

Install from crates.io:

cargo install datafusion-cli --version 28.0.0

Or build from source

git clone https://github.com/apache/arrow-datafusion.git
cd datafusion
cargo install --path datafusion-cli

Install DuckDB

python3 -m venv `pwd`/venv
source venv/bin/activate
pip install duckdb psutil

Run queries

queres are run with run-datafusion.sh or run-duckdb.sh.

DuckDB:

CREATE=create-single-duckdb.sql bash run-duckdb.sh

DataFusion

DATAFUSION_CLI=./datafusion-cli.413eba1 CREATE=create-single-datafusion.sql bash run-datafusion.sh

More examples in benchmark.sh

Results

Results are written into result.csv

Python Example

The example python script is hash.py

python3 hash.py