-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Update tpch, clickbench, sort_tpch to mark failed queries #16182
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
2a9d7a8 to
e484c52
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
The PR updates several benchmark modules to mark failed queries and continue executing subsequent queries when a failure occurs. Key changes include:
- Adding a new "success" field to track query failure status.
- Updating benchmark execution in tpch, sort_tpch, imdb, and clickbench to mark failures instead of aborting.
- Modifying compare.py to exclude failed queries from average time calculations and reporting the list of failed queries.
Reviewed Changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| benchmarks/src/util/run.rs | Added "success" field and mark_failed method to track query results. |
| benchmarks/src/util/mod.rs | Re-exported QueryResult to support new benchmark result structure. |
| benchmarks/src/tpch/run.rs | Updated query execution loop to handle failures and record failed query IDs. |
| benchmarks/src/sort_tpch.rs | Adjusted benchmark loop to mark failed queries and print failure details. |
| benchmarks/src/imdb/run.rs | Applied similar failure handling as in tpch and sort_tpch modules. |
| benchmarks/src/clickbench.rs | Refactored benchmark_query logic to include failure handling and iterations abstraction. |
| benchmarks/compare.py | Updated average time calculations and display to exclude failed queries. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, this is quite helpful for us to work towards robust memory-limited execution. We need this feature since now many queries are failing under memory limit, if they can stably finish in the future we can consider making it a CI test.
I tested tpch/clickbench/sort_tpch, and first two can run as expected, sort_tpch benchmark seems need a fix in the runner 🤔
Benchmark commands:
cargo run --profile release-nonlto --bin tpch -- benchmark datafusion --iterations 5 --path /Users/yongting/Code/datafusion/benchmarks/data/tpch_sf10 --prefer_hash_join true --format parquet -o /Users/yongting/Code/datafusion/benchmarks/results/pr-16182/tpch_sf1.json --memory-limit 1G
cargo run --profile release-nonlto --bin dfbench -- clickbench --iterations 5 --path /Users/yongting/Code/datafusion/benchmarks/data/hits.parquet --queries-path /Users/yongting/Code/datafusion/benchmarks/queries/clickbench/queries.sql -o /Users/yongting/Code/datafusion/benchmarks/results/pr-16182/clickbench_1.json --memory-limit 1G
cargo run --profile release-nonlto --bin dfbench -- sort-tpch --iterations 5 --path /Users/yongting/Code/datafusion/benchmarks/data/tpch_sf10 -o /Users/yongting/Code/datafusion/benchmarks/results/pr-16182/sort_tpch.json --memory-limit 2G
sort_tpch's error message
ongting@Mac ~/C/d/benchmarks (pr-16182 *)> cargo run --profile release-nonlto --bin dfbench -- sort-tpch --iterations 5 --path /Users/yongting/Code/datafusion/benchmarks/data/tpch_sf10 -o /Users/yongting/Code/datafusion/benchmarks/results/pr-16182/sort_tpch.json --memory-limit 2G
Finished `release-nonlto` profile [optimized] target(s) in 0.12s
Running `/Users/yongting/Code/datafusion/target/release-nonlto/dfbench sort-tpch --iterations 5 --path /Users/yongting/Code/datafusion/benchmarks/data/tpch_sf10 -o /Users/yongting/Code/datafusion/benchmarks/results/pr-16182/sort_tpch.json --memory-limit 2G`
Q1 iteration 0 took 1645.0 ms and returned 59986052 rows
Q1 iteration 1 took 1674.9 ms and returned 59986052 rows
Q1 iteration 2 took 1635.0 ms and returned 59986052 rows
Q1 iteration 3 took 1672.7 ms and returned 59986052 rows
Q1 iteration 4 took 1604.2 ms and returned 59986052 rows
Q1 avg time: 1646.32 ms
Q2 iteration 0 took 1282.9 ms and returned 59986052 rows
Q2 iteration 1 took 1286.9 ms and returned 59986052 rows
Q2 iteration 2 took 1313.8 ms and returned 59986052 rows
Q2 iteration 3 took 1300.1 ms and returned 59986052 rows
Q2 iteration 4 took 1290.9 ms and returned 59986052 rows
Q2 avg time: 1294.90 ms
Q3 iteration 0 took 6326.5 ms and returned 59986052 rows
Q3 iteration 1 took 6289.8 ms and returned 59986052 rows
Q3 iteration 2 took 6223.8 ms and returned 59986052 rows
Q3 iteration 3 took 6329.6 ms and returned 59986052 rows
Q3 iteration 4 took 6333.3 ms and returned 59986052 rows
Q3 avg time: 6300.60 ms
Q4 iteration 0 took 1574.2 ms and returned 59986052 rows
Q4 iteration 1 took 1578.4 ms and returned 59986052 rows
Q4 iteration 2 took 1586.4 ms and returned 59986052 rows
Q4 iteration 3 took 1608.0 ms and returned 59986052 rows
Q4 iteration 4 took 1578.5 ms and returned 59986052 rows
Q4 avg time: 1585.13 ms
thread 'main' panicked at /Users/yongting/Code/datafusion/benchmarks/src/sort_tpch.rs:312:32:
called `Result::unwrap()` on an `Err` value: ResourcesExhausted("Additional allocation failed with top memory consumers (across reservations) as:\n ExternalSorterMerge[2]#586(can spill: false) consumed 182.5 MB,\n ExternalSorterMerge[10]#602(can spill: false) consumed 179.3 MB,\n ExternalSorterMerge[5]#592(can spill: false) consumed 176.1 MB,\n ExternalSorterMerge[4]#590(can spill: false) consumed 175.3 MB,\n ExternalSorterMerge[0]#582(can spill: false) consumed 173.0 MB.\nError: Failed to allocate additional 248.1 KB for ExternalSorterMerge[8] with 0.0 B already allocated for this reservation - 37.8 KB remain available for the total pool")
Other than that I left some minor suggestions, looking forward to your feedback!
e484c52 to
3d5ab62
Compare
| let mut stream = execute_stream(physical_plan.clone(), state.task_ctx())?; | ||
| while let Some(batch) = stream.next().await { | ||
| row_count += batch.unwrap().num_rows(); | ||
| row_count += batch?.num_rows(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I replaced unwrap() with ? so that sort_tpch does not terminate early. Thanks for catching that !
|
Thank you for review. I’ve applied your suggestions, and I’ll run it a few more times and check the output display once again before requesting a final review. If there are any other benchmarks I didn’t touch that would benefit from similar changes, please let me know. |
| pub fn maybe_print_failures(&self) { | ||
| let failed_queries: Vec<&str> = self | ||
| .queries | ||
| .iter() | ||
| .filter_map(|q| (!q.success).then_some(q.query.as_str())) | ||
| .collect(); | ||
|
|
||
| if !failed_queries.is_empty() { | ||
| println!("Failed Queries: {}", failed_queries.join(", ")); | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we call q.query.as_str here, expected output may varies in benchmarks.
For example,
// In sort_tpch
Failed Queries: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11
// In clickbench
Failed Queries: Query 4, Query 5, Query 7
2010YOUY01
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! Thanks again.
benchmarks/src/util/run.rs
Outdated
| if let Some(idx) = self.current_case { | ||
| self.queries[idx].success = false; | ||
| } else { | ||
| panic!("Cannot mark failure: no current case"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| panic!("Cannot mark failure: no current case"); | |
| unreachable!("Cannot mark failure: no current case"); |
panic usually is used for errors that are possible but unrecoverable. unreachable is better for this case (just an assertion, logically impossible to happen)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated :) Thank you
3d5ab62 to
b6999aa
Compare
* Move struct QueryResult to util/run.rs * Modify benches to continue query execution even on failure * Mark benchmark query success on output json
… in values array (#16258) * Add tests * trigger ci * Update tpch, clickbench, sort_tpch to mark failed queries (#16182) * Move struct QueryResult to util/run.rs * Modify benches to continue query execution even on failure * Mark benchmark query success on output json * Adjust slttest to pass without RUST_BACKTRACE enabled (#16251) * add more tests where the dict keys are not null but dict values are null * Revert "add more tests where the dict keys are not null but dict values are null" This reverts commit c745dae. * Add tests for count and count_distinct with dictionary arrays containing null values * resolve merge conflict, reorder imports * Add helper function to create dictionary array with non-null keys and null values * Add test for count_distinct accumulator with dictionary array containing all null values * add tests to aggregate.rs * remove redundant comments in get_formatted_results function * remove redundant safety checks in count distinct dictionary test * refactor: introduce helper function for output assertion * test: add count distinct dictionary handling for null values * refactor: streamline imports and improve code organization in count.rs * fix: add missing import for batches_to_string in aggregates.rs * test: reorganize tests * refactor: simplify COUNT(DISTINCT) tests and remove unused helper functions * refactor: trim redundant tests --------- Co-authored-by: ding-young <lsyhime@snu.ac.kr> Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> Co-authored-by: Dmitrii Blaginin <dmitrii@blaginin.me>
Which issue does this PR close?
Rationale for this change
To track whether certain queries fail on limited memory, we need to continue executing benchmark queries even when previous one fails. Also, it would be better to mark & visualize the failure with
compare.py.What changes are included in this PR?
tpch,clickbench,sort_tpchexecutes next query when current query fails on any iteration. (It will not execute the next iteration of failed query)successadded to output json, updatedcompare.pyto compare the elapsed time of successful results, not failed ones.Are these changes tested?
Since it's benchmark suite, I tested it manually.
Are there any user-facing changes?