Skip to content

AORTA-24 : Add torch profiler for multi gpu workload run#106

Open
prosenjitdhole wants to merge 2 commits intomainfrom
prosenj_hw_q_eval_profiler_fix
Open

AORTA-24 : Add torch profiler for multi gpu workload run#106
prosenjitdhole wants to merge 2 commits intomainfrom
prosenj_hw_q_eval_profiler_fix

Conversation

@prosenjitdhole
Copy link
Collaborator

Fix for enabling torch profiler for multi GPU stream.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the hw_queue_eval run CLI command to enable PyTorch profiling in a way that matches the harness’s multi-GPU stream distribution, avoiding single-GPU-only profiling behavior when multiple GPUs are available.

Changes:

  • Add a dedicated profiling phase that creates multi-GPU streams (round-robin across available GPUs) to mirror the harness behavior.
  • Synchronize all involved CUDA devices after each profiled iteration to ensure multi-GPU work is fully captured.
  • Add CLI output describing whether profiling is using single- or multi-GPU mode and the stream-to-device distribution.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Commiting co-pilot suggestion for calling setup twice.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants