Skip to content

Integrate mem_profile utility to bench.sh #16938

@ding-young

Description

@ding-young

Is your feature request related to a problem or challenge?

PR#16814 adds a new benchmark utility to retrieve memory statistics and print summary table.

We can run the binary directly with cargo run --profile release-nonlto --bin mem_profile -- --bench-profile release-nonlto tpch --path benchmarks/data/tpch_sf1 --partitions 4 --format parquet --query 1. However, there is still no integration with bench.sh to easily run individual benchmarks through mem_profile, nor is there a utility to compare results across different branches.

Describe the solution you'd like

Side Note

The way mem_profile collects the metrics and prints them out is quite different to other existing benchmark utilities.
For memory profiling, mem_profile spawns a new subprocess for each query execution. As a result, it does not generate a single output.json file for all bench queries like other benchmarks, but instead prints a summary table to stdout. To compare results across branches, we should either capture this stdout, or modify mem_profile.rs to also write results to a JSON file or other structured format.

Steps

  1. Navigate bench.sh and update places where it uses outdated entrypoint.
    e.g. replace --bin tpch with dfbench -- tpch
    (mem_profile passes subcommand and args to dfbench, so it would be easier to integrate it)
  2. Add support for memory profiling mode in bench.sh
    Modify bench.sh so that setting MEM_PROFILE=true runs each benchmark through the mem_profile binary instead of dfbench directly.
  3. Extend compare.py and mem_profile.rs to allow side-by-side comparison of memory usage across branches or runs

Describe alternatives you've considered

No response

Additional context

No response

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions