diff --git a/benchmarks/README.md b/benchmarks/README.md index b5c767cc5cc6..b18231197716 100644 --- a/benchmarks/README.md +++ b/benchmarks/README.md @@ -24,11 +24,10 @@ open source benchmark suites, to help with performance and scalability testing of DataFusion. -# Benchmarks Against Other Engines - -This crate is used to benchmark changes to DataFusion itself, rather -than benchmarking against another engine. +## Other engines +The benchmarks measure changes to DataFusion itself, rather than +its performance against other engines. For competitive benchmarking, DataFusion is included in the benchmark setups for several popular benchmarks that compare performance with other engines. For example: @@ -40,7 +39,7 @@ benchmarks that compare performance with other engines. For example: # Running the benchmarks -## Running Benchmarks +## `bench.sh` The easiest way to run benchmarks is the [bench.sh](bench.sh) script. Usage instructions can be found with: @@ -50,7 +49,7 @@ script. Usage instructions can be found with: ./bench.sh ``` -## Generating Data +## Generating data You can create / download the data for these benchmarks using the [bench.sh](bench.sh) script: @@ -68,7 +67,7 @@ Create / download a specific dataset (TPCH) Data is placed in the `data` subdirectory. -## Comparing peformance on main to a branch +## Comparing performance of main and a branch ```shell git checkout main @@ -154,17 +153,13 @@ Benchmark tpch_mem.json ### Running Benchmarks Manually -Assuming the data created in the `data` directory, the `tpch` benchmark can be run like +Assuming data in the `data` directory, the `tpch` benchmark can be run with a command like this ```bash cargo run --release --bin dfbench -- tpch --iterations 3 --path ./data --format tbl --query 1 --batch-size 4096 ``` -If you omit `--query=` argument, then all 22 queries will be run - -```bash -cargo run --release --bin dfbench -- tpch --iterations 1 --path ./data --format tbl --batch-size 4096 -``` +See the help for more details ### Different features @@ -258,14 +253,11 @@ Query 1 avg time: 1956.11 ms The `dfbench` program contains subcommands to run various benchmarks. Full help can be found in the relevant sub command. For example to get help for tpch, -run `cargo run --bin dfbench tpch --help` +run `cargo run --release --bin dfbench tpch --help` ```shell -cargo run --bin dfbench --help - -cargo run --bin dfbench -- --help - Finished dev [unoptimized + debuginfo] target(s) in 0.29s - Running `/Users/alamb/Software/target-df2/debug/dfbench --help` +cargo run --release --bin dfbench --help +... datafusion-benchmarks 27.0.0 benchmark command @@ -280,9 +272,6 @@ SUBCOMMANDS: ``` - - - ## NYC Taxi Benchmark These benchmarks are based on the [New York Taxi and Limousine Commission][2] data set. diff --git a/benchmarks/bench.sh b/benchmarks/bench.sh index 392935d937e7..ca58e49f6043 100755 --- a/benchmarks/bench.sh +++ b/benchmarks/bench.sh @@ -35,8 +35,7 @@ BENCHMARK=all DATAFUSION_DIR=${DATAFUSION_DIR:-$SCRIPT_DIR/..} DATA_DIR=${DATA_DIR:-$SCRIPT_DIR/data} #CARGO_COMMAND=${CARGO_COMMAND:-"cargo run --release"} -#CARGO_COMMAND=${CARGO_COMMAND:-"cargo run --profile release-nonlto"} # TEMP: for faster iterations -CARGO_COMMAND=${CARGO_COMMAND:-"cargo run "} # TEMP: for faster iterations +CARGO_COMMAND=${CARGO_COMMAND:-"cargo run --profile release-nonlto"} # for faster iterations usage() { echo "