-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Description
Is your feature request related to a problem or challenge?
PR#16814 adds a new benchmark utility to retrieve memory statistics and print summary table.
We can run the binary directly with cargo run --profile release-nonlto --bin mem_profile -- --bench-profile release-nonlto tpch --path benchmarks/data/tpch_sf1 --partitions 4 --format parquet --query 1. However, there is still no integration with bench.sh to easily run individual benchmarks through mem_profile, nor is there a utility to compare results across different branches.
Describe the solution you'd like
Side Note
The way mem_profile collects the metrics and prints them out is quite different to other existing benchmark utilities.
For memory profiling, mem_profile spawns a new subprocess for each query execution. As a result, it does not generate a single output.json file for all bench queries like other benchmarks, but instead prints a summary table to stdout. To compare results across branches, we should either capture this stdout, or modify mem_profile.rs to also write results to a JSON file or other structured format.
Steps
- Navigate
bench.shand update places where it uses outdated entrypoint.
e.g. replace--bin tpchwithdfbench -- tpch
(mem_profilepasses subcommand and args todfbench, so it would be easier to integrate it) - Add support for memory profiling mode in bench.sh
Modify bench.sh so that setting MEM_PROFILE=true runs each benchmark through the mem_profile binary instead of dfbench directly. - Extend compare.py and mem_profile.rs to allow side-by-side comparison of memory usage across branches or runs
Describe alternatives you've considered
No response
Additional context
No response