- 
          
- 
                Notifications
    You must be signed in to change notification settings 
- Fork 10.9k
[Log] Only Print Profiler Results on Rank 0 #23370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Log] Only Print Profiler Results on Rank 0 #23370
Conversation
Signed-off-by: yewentao256 <zhyanwentao@126.com>
| 👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run  Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add  🚀 | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request aims to reduce log verbosity by ensuring that profiler results are only printed on rank 0. The changes correctly implement this by adding a conditional check for self.rank == 0 before printing the profiler summary table in both vllm/v1/worker/gpu_worker.py and vllm/worker/worker.py. This effectively suppresses redundant output from other ranks. The implementation is correct and achieves the intended purpose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe better to change this to print only if the local_rank is 0?
That way at least every pod will print the profiler results, with no risk of the ranks clobbering each other
| Did not see Tyler's comment, I agree with what he said | 
Signed-off-by: yewentao256 <zhyanwentao@126.com>
| 
 Fixed, thanks! @tlrmchlsmth | 
* 'main' of https://github.com/845473182/vllm: (457 commits) [BugFix] Fix routed_scaling_factor double mul for dots1 and glm4 MoE models (vllm-project#24132) [Misc] Add check for dual_chunk_attention (vllm-project#24070) [Doc]: fix typos in Python comments (vllm-project#24115) [Doc]: fix typos in Python comments (vllm-project#24093) [Compile] Fix Compile Warning for `w4a8_mm_entry.cu` (vllm-project#23660) fix some typos (vllm-project#24071) [V1] Wrapper which plumbs request-level logits processors into vLLM batch-level logits processing (vllm-project#23656) Upgrade xgrammar to 0.1.23 (vllm-project#22988) Update release pipeline post PyTorch 2.8.0 update (vllm-project#24073) [XPU] Fix the bug of LoRA logits on the XPU platform (vllm-project#24081) [CI/Build] Disable SiluMul NVFP4 quant fusion tests (vllm-project#24121) [Bug] R1 Accuracy: Fix `routed_scaling_factor` Double Mul Issue (vllm-project#24119) [AMD][Kernel][Bugfix] Cast offsets tensor bn to tl.int64 to avoid GPU segfault (vllm-project#23692) [CI] Enable all hf transformers baselines in test_hybrid (vllm-project#23936) [Log] Only Print Profiler Results on Rank 0 (vllm-project#23370) Fix weights loading for Apertus (vllm-project#24100) [Metrics] Deprecate TPOT in favor of ITL (vllm-project#24110) [Bugfix] Fix packed_factor missing attribute error (vllm-project#23902) Run ruff format on a few files. (vllm-project#24075) [Bugfix] Fix transform_config parsing in Compressed Tensors (vllm-project#23945) ...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Purpose
Only Print Profiler Results on Rank 0
Currently we print everything in each DP rank, which is a little bit too much, eg: