Describe the bug
On my machine
OS: Fedora Linux 43 (KDE Plasma Desktop Edition) x86_64
Kernel: Linux 6.19.11-200.fc43.x86_64
CPU: AMD Ryzen AI 9 HX 370 (24) @ 5.16 GHz
GPU: AMD Radeon 890M Graphics [Integrated]
Memory: 11.96 GiB / 86.02 GiB (14%)
Swap: 0 B / 8.00 GiB (0%)
with updated rust:
$ rustup show
Default host: x86_64-unknown-linux-gnu
rustup home: /home/bruce/.rustup
installed toolchains
--------------------
stable-x86_64-unknown-linux-gnu (default)
nightly-x86_64-unknown-linux-gnu
1.91.0-x86_64-unknown-linux-gnu
1.92.0-x86_64-unknown-linux-gnu
1.93.0-x86_64-unknown-linux-gnu
1.94.0-x86_64-unknown-linux-gnu (active)
active toolchain
----------------
name: 1.94.0-x86_64-unknown-linux-gnu
active because: overridden by '/home/bruce/dev/datafusion2/rust-toolchain.toml'
installed targets:
x86_64-unknown-linux-gnu
Running against main:
cd benchmarks;./bench.sh data tpch;./bench.sh run tpch 18 will hang
$ ./bench.sh run tpch 18
***************************
DataFusion Benchmark Script
COMMAND: run
BENCHMARK: tpch
QUERY: 18
DATAFUSION_DIR: /home/bruce/dev/datafusion2/benchmarks/..
BRANCH_NAME: HEAD
DATA_DIR: /home/bruce/dev/datafusion2/benchmarks/data
RESULTS_DIR: /home/bruce/dev/datafusion2/benchmarks/results/HEAD
CARGO_COMMAND: cargo run --release
PREFER_HASH_JOIN: true
SIMULATE_LATENCY: false
***************************
RESULTS_FILE: /home/bruce/dev/datafusion2/benchmarks/results/HEAD/tpch_sf1.json
Running tpch benchmark...
+ cargo run --release --bin dfbench -- tpch --iterations 5 --path /home/bruce/dev/datafusion2/benchmarks/data/tpch_sf1 --prefer_hash_join true --format parquet -o /home/bruce/dev/datafusion2/benchmarks/results/HEAD/tpch_sf1.json --query 18
Finished `release` profile [optimized] target(s) in 0.11s
Running `/home/bruce/dev/datafusion2/target/release/dfbench tpch --iterations 5 --path /home/bruce/dev/datafusion2/benchmarks/data/tpch_sf1 --prefer_hash_join true --format parquet -o /home/bruce/dev/datafusion2/benchmarks/results/HEAD/tpch_sf1.json --query 18`
Running benchmarks with the following options: RunOpt { query: Some(18), common: CommonOpt { iterations: 5, partitions: None, batch_size: None, mem_pool_type: "fair", memory_limit: None, sort_spill_reservation_bytes: None, debug: false, simulate_latency: false }, path: "/home/bruce/dev/datafusion2/benchmarks/data/tpch_sf1", file_format: "parquet", mem_table: false, output_path: Some("/home/bruce/dev/datafusion2/benchmarks/results/HEAD/tpch_sf1.json"), disable_statistics: false, prefer_hash_join: true, enable_piecewise_merge_join: false, sorted: false, hash_join_buffering_capacity: 0 }
git bisect points to this commit as the cause. Running the test at the commit just prior to that one succeeds. Running it at that commit fails.
If prefer_hash_join is disabled the query will run as expected:
PREFER_HASH_JOIN=false ./bench.sh run tpch 18
***************************
DataFusion Benchmark Script
COMMAND: run
BENCHMARK: tpch
QUERY: 18
DATAFUSION_DIR: /home/bruce/dev/datafusion2/benchmarks/..
BRANCH_NAME: HEAD
DATA_DIR: /home/bruce/dev/datafusion2/benchmarks/data
RESULTS_DIR: /home/bruce/dev/datafusion2/benchmarks/results/HEAD
CARGO_COMMAND: cargo run --release
PREFER_HASH_JOIN: false
SIMULATE_LATENCY: false
***************************
RESULTS_FILE: /home/bruce/dev/datafusion2/benchmarks/results/HEAD/tpch_sf1.json
Running tpch benchmark...
+ cargo run --release --bin dfbench -- tpch --iterations 5 --path /home/bruce/dev/datafusion2/benchmarks/data/tpch_sf1 --prefer_hash_join false --format parquet -o /home/bruce/dev/datafusion2/benchmarks/results/HEAD/tpch_sf1.json --query 18
Finished `release` profile [optimized] target(s) in 0.15s
Running `/home/bruce/dev/datafusion2/target/release/dfbench tpch --iterations 5 --path /home/bruce/dev/datafusion2/benchmarks/data/tpch_sf1 --prefer_hash_join false --format parquet -o /home/bruce/dev/datafusion2/benchmarks/results/HEAD/tpch_sf1.json --query 18`
Running benchmarks with the following options: RunOpt { query: Some(18), common: CommonOpt { iterations: 5, partitions: None, batch_size: None, mem_pool_type: "fair", memory_limit: None, sort_spill_reservation_bytes: None, debug: false, simulate_latency: false }, path: "/home/bruce/dev/datafusion2/benchmarks/data/tpch_sf1", file_format: "parquet", mem_table: false, output_path: Some("/home/bruce/dev/datafusion2/benchmarks/results/HEAD/tpch_sf1.json"), disable_statistics: false, prefer_hash_join: false, enable_piecewise_merge_join: false, sorted: false, hash_join_buffering_capacity: 0 }
Query 18 iteration 0 took 206.5 ms and returned 57 rows
Query 18 iteration 1 took 190.0 ms and returned 57 rows
Query 18 iteration 2 took 188.5 ms and returned 57 rows
Query 18 iteration 3 took 185.0 ms and returned 57 rows
Query 18 iteration 4 took 192.1 ms and returned 57 rows
Query 18 avg time: 192.42 ms
+ set +x
Done
To Reproduce
This seems to be machine/OS specific. I've been unable to reproduce on other machines.
Expected behavior
No response
Additional context
No response
Describe the bug
On my machine
with updated rust:
Running against main:
cd benchmarks;./bench.sh data tpch;./bench.sh run tpch 18will hanggit bisect points to this commit as the cause. Running the test at the commit just prior to that one succeeds. Running it at that commit fails.
If prefer_hash_join is disabled the query will run as expected:
To Reproduce
This seems to be machine/OS specific. I've been unable to reproduce on other machines.
Expected behavior
No response
Additional context
No response