Operator level microbenchmarking #3154

SSYernar · 2025-07-02T21:15:41Z

Summary:
This change introduces microbenchmarking for PyTorch operators.
Since we need to capture and measure each operator call (which is happening under the hood of PyTorch), we need to use torch.profiler.profile. Example operators are aten:mm, aten::sigmoid, cudaLaunchKernel, etc…
Use --benchmark_operators to enable the operator-level benchmarking.
Use --limit_operator_results argument to specify the number of top runtime operators to benchmark.
Use --target_operators argument to list PyTorch operators to benchmark.

Example output:

TrainPipelineSparseDist             | Malloc retries (P50/P90/P100): 0.0 / 0.0 / 0.0 | Runtime (P90): 442.08 ms | Peak Memory alloc (P90): 24.23 GB | Peak Memory reserved (P90): 26.21 GB
operator_aten::copy_                | Malloc retries (P50/P90/P100): -1.0 / -1.0 / -1.0 | Runtime (P90): 39.21 ms | Peak Memory alloc (P90): 0.00 GB | Peak Memory reserved (P90): -0.00 GB
...

Differential Revision: D77676673

Summary: This change introduces microbenchmarking for PyTorch operators. Since we need to capture and measure each operator call (which is happening under the hood of PyTorch), we need to use torch.profiler.profile. Example operators are `aten:mm`, `aten::sigmoid`, `cudaLaunchKernel`, etc… Use `--benchmark_operators` to enable the operator-level benchmarking. Use `--limit_operator_results` argument to specify the number of top runtime operators to benchmark. Use `--target_operators` argument to list PyTorch operators to benchmark. Example output: ``` TrainPipelineSparseDist | Malloc retries (P50/P90/P100): 0.0 / 0.0 / 0.0 | Runtime (P90): 442.08 ms | Peak Memory alloc (P90): 24.23 GB | Peak Memory reserved (P90): 26.21 GB operator_aten::copy_ | Malloc retries (P50/P90/P100): -1.0 / -1.0 / -1.0 | Runtime (P90): 39.21 ms | Peak Memory alloc (P90): 0.00 GB | Peak Memory reserved (P90): -0.00 GB ... ``` Differential Revision: D77676673

facebook-github-bot · 2025-07-02T21:15:53Z

This pull request was exported from Phabricator. Differential Revision: D77676673

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jul 2, 2025

facebook-github-bot added the fb-exported label Jul 2, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Operator level microbenchmarking #3154

Operator level microbenchmarking #3154

Uh oh!

SSYernar commented Jul 2, 2025

Uh oh!

facebook-github-bot commented Jul 2, 2025

Uh oh!

Uh oh!

Operator level microbenchmarking #3154

Are you sure you want to change the base?

Operator level microbenchmarking #3154

Uh oh!

Conversation

SSYernar commented Jul 2, 2025

Uh oh!

facebook-github-bot commented Jul 2, 2025

Uh oh!

Uh oh!