Skip to content

Commit

Permalink
stwaks
Browse files Browse the repository at this point in the history
  • Loading branch information
alamb committed Jul 23, 2023
1 parent 7a0581c commit 063d0c9
Show file tree
Hide file tree
Showing 2 changed files with 12 additions and 24 deletions.
33 changes: 11 additions & 22 deletions benchmarks/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,11 +24,10 @@ open source benchmark suites, to help with performance and scalability
testing of DataFusion.


# Benchmarks Against Other Engines

This crate is used to benchmark changes to DataFusion itself, rather
than benchmarking against another engine.
## Other engines

The benchmarks measure changes to DataFusion itself, rather than
its performance against other engines. For competitive benchmarking,
DataFusion is included in the benchmark setups for several popular
benchmarks that compare performance with other engines. For example:

Expand All @@ -40,7 +39,7 @@ benchmarks that compare performance with other engines. For example:

# Running the benchmarks

## Running Benchmarks
## `bench.sh`

The easiest way to run benchmarks is the [bench.sh](bench.sh)
script. Usage instructions can be found with:
Expand All @@ -50,7 +49,7 @@ script. Usage instructions can be found with:
./bench.sh
```

## Generating Data
## Generating data

You can create / download the data for these benchmarks using the [bench.sh](bench.sh) script:

Expand All @@ -68,7 +67,7 @@ Create / download a specific dataset (TPCH)

Data is placed in the `data` subdirectory.

## Comparing peformance on main to a branch
## Comparing performance of main and a branch

```shell
git checkout main
Expand Down Expand Up @@ -154,17 +153,13 @@ Benchmark tpch_mem.json

### Running Benchmarks Manually

Assuming the data created in the `data` directory, the `tpch` benchmark can be run like
Assuming data in the `data` directory, the `tpch` benchmark can be run with a command like this

```bash
cargo run --release --bin dfbench -- tpch --iterations 3 --path ./data --format tbl --query 1 --batch-size 4096
```

If you omit `--query=<query_id>` argument, then all 22 queries will be run

```bash
cargo run --release --bin dfbench -- tpch --iterations 1 --path ./data --format tbl --batch-size 4096
```
See the help for more details

### Different features

Expand Down Expand Up @@ -258,14 +253,11 @@ Query 1 avg time: 1956.11 ms
The `dfbench` program contains subcommands to run various benchmarks.

Full help can be found in the relevant sub command. For example to get help for tpch,
run `cargo run --bin dfbench tpch --help`
run `cargo run --release --bin dfbench tpch --help`

```shell
cargo run --bin dfbench --help

cargo run --bin dfbench -- --help
Finished dev [unoptimized + debuginfo] target(s) in 0.29s
Running `/Users/alamb/Software/target-df2/debug/dfbench --help`
cargo run --release --bin dfbench --help
...
datafusion-benchmarks 27.0.0
benchmark command

Expand All @@ -280,9 +272,6 @@ SUBCOMMANDS:

```




## NYC Taxi Benchmark

These benchmarks are based on the [New York Taxi and Limousine Commission][2] data set.
Expand Down
3 changes: 1 addition & 2 deletions benchmarks/bench.sh
Original file line number Diff line number Diff line change
Expand Up @@ -35,8 +35,7 @@ BENCHMARK=all
DATAFUSION_DIR=${DATAFUSION_DIR:-$SCRIPT_DIR/..}
DATA_DIR=${DATA_DIR:-$SCRIPT_DIR/data}
#CARGO_COMMAND=${CARGO_COMMAND:-"cargo run --release"}
#CARGO_COMMAND=${CARGO_COMMAND:-"cargo run --profile release-nonlto"} # TEMP: for faster iterations
CARGO_COMMAND=${CARGO_COMMAND:-"cargo run "} # TEMP: for faster iterations
CARGO_COMMAND=${CARGO_COMMAND:-"cargo run --profile release-nonlto"} # for faster iterations

usage() {
echo "
Expand Down

0 comments on commit 063d0c9

Please sign in to comment.