stwaks

apache · Jul 23, 2023 · 063d0c9 · 063d0c9
1 parent 7a0581c
commit 063d0c9
Show file tree

Hide file tree

Showing 2 changed files with 12 additions and 24 deletions.
diff --git a/benchmarks/README.md b/benchmarks/README.md
@@ -24,11 +24,10 @@ open source benchmark suites, to help with performance and scalability
 testing of DataFusion.
 
 
-# Benchmarks Against Other Engines
-
-This crate is used to benchmark changes to DataFusion itself, rather
-than benchmarking against another engine.
+## Other engines
 
+The benchmarks measure changes to DataFusion itself, rather than
+its performance against other engines. For competitive benchmarking,
 DataFusion is included in the benchmark setups for several popular
 benchmarks that compare performance with other engines. For example:
 
@@ -40,7 +39,7 @@ benchmarks that compare performance with other engines. For example:
 
 # Running the benchmarks
 
-## Running Benchmarks
+## `bench.sh`
 
 The easiest way to run benchmarks is the [bench.sh](bench.sh)
 script. Usage instructions can be found with:
@@ -50,7 +49,7 @@ script. Usage instructions can be found with:
 ./bench.sh
 ```
 
-## Generating Data
+## Generating data
 
 You can create / download the data for these benchmarks using the [bench.sh](bench.sh) script:
 
@@ -68,7 +67,7 @@ Create / download a specific dataset (TPCH)
 
 Data is placed in the `data` subdirectory.
 
-## Comparing peformance on main to a branch
+## Comparing performance of main and a branch
 
 ```shell
 git checkout main
@@ -154,17 +153,13 @@ Benchmark tpch_mem.json
 
 ### Running Benchmarks Manually
 
-Assuming the data created in the `data` directory, the `tpch` benchmark can be run like
+Assuming data in the `data` directory, the `tpch` benchmark can be run with a command like this
 
 ```bash
 cargo run --release --bin dfbench -- tpch --iterations 3 --path ./data --format tbl --query 1 --batch-size 4096
 ```
 
-If you omit `--query=<query_id>` argument, then all 22 queries will be run
-
-```bash
-cargo run --release --bin dfbench -- tpch --iterations 1 --path ./data --format tbl --batch-size 4096
-```
+See the help for more details
 
 ### Different features
 
@@ -258,14 +253,11 @@ Query 1 avg time: 1956.11 ms
 The `dfbench` program contains subcommands to run various benchmarks.
 
 Full help can be found in the relevant sub command. For example to get help for tpch,
-run `cargo run  --bin dfbench tpch --help`
+run `cargo run --release  --bin dfbench tpch --help`
 
 ```shell
-cargo run  --bin dfbench  --help
-
-cargo run  --bin dfbench  -- --help
-    Finished dev [unoptimized + debuginfo] target(s) in 0.29s
-     Running `/Users/alamb/Software/target-df2/debug/dfbench --help`
+cargo run --release --bin dfbench  --help
+...
 datafusion-benchmarks 27.0.0
 benchmark command
 
@@ -280,9 +272,6 @@ SUBCOMMANDS:
 
 ```
 
-
-
-
 ## NYC Taxi Benchmark
 
 These benchmarks are based on the [New York Taxi and Limousine Commission][2] data set.

diff --git a/benchmarks/bench.sh b/benchmarks/bench.sh
@@ -35,8 +35,7 @@ BENCHMARK=all
 DATAFUSION_DIR=${DATAFUSION_DIR:-$SCRIPT_DIR/..}
 DATA_DIR=${DATA_DIR:-$SCRIPT_DIR/data}
 #CARGO_COMMAND=${CARGO_COMMAND:-"cargo run --release"}
-#CARGO_COMMAND=${CARGO_COMMAND:-"cargo run --profile release-nonlto"}  # TEMP: for faster iterations
-CARGO_COMMAND=${CARGO_COMMAND:-"cargo run "}  # TEMP: for faster iterations
+CARGO_COMMAND=${CARGO_COMMAND:-"cargo run --profile release-nonlto"}  # for faster iterations
 
 usage() {
     echo "