-
-
Notifications
You must be signed in to change notification settings - Fork 8.4k
[V1] Only print cudagraph tqdm on rank 0 with is_global_first_rank
#19516
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: mgoin <mgoin64@gmail.com>
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary of Changes
Hello @mgoin, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request addresses noisy console output in distributed VLLM runs by ensuring that the progress bar shown during CUDA graph capture is only displayed by the global rank 0 process. This is achieved by introducing a new helper function to check for the global first rank and using it to conditionally wrap the graph capture loop with tqdm
.
Highlights
- Helper Function: Introduced a new helper function
is_global_first_rank
invllm/distributed/parallel_state.py
to determine if the current process is the global rank 0 across all parallelism dimensions (TP, PP, DP, etc.). This provides a robust way to identify the primary process for logging or UI updates. - Conditional Progress Bar: Modified the CUDA graph capture logic in
vllm/v1/worker/gpu_model_runner.py
to only display thetqdm
progress bar on the global first rank. This prevents duplicate progress bars from appearing when running with multiple workers/ranks, cleaning up the console output.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command>
or @gemini-code-assist <command>
. Below is a summary of the supported commands.
Feature | Command | Description |
---|---|---|
Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/
folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configureGemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a useful change to prevent multiple progress bars during CUDA graph capturing in distributed settings, enhancing log readability. The core logic involves a new is_global_first_rank
utility.
The primary suggestion focuses on improving the error handling in this new utility function by making the exception catching more specific and considering logging for better diagnosability. This will make the function more robust for potential future uses beyond controlling tqdm
visibility.
except Exception: | ||
# If anything goes wrong, assume this is the first rank | ||
return True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The use of a bare except Exception:
can mask various underlying issues from torch.distributed
or the _WORLD
state. This might lead to unexpected behavior if, for example, multiple processes erroneously assume they are the global first rank due to an unrelated exception during rank determination.
To improve robustness and diagnosability, consider the following:
- Capture the exception instance: Change
except Exception:
toexcept Exception as e:
. This allows access to the specific error. - Log the exception: Logging the error
e
would be beneficial for debugging. This helps in understanding why rank determination might have failed. (Assuming alogger
is available or can be set up in this module, similar to othervllm.distributed
files). - Docstring update: The current docstring explains that the function returns
True
if distributed is not initialized. It would be helpful to also document the behavior of returningTrue
in case of any other exception during rank checking, if this is the intended safe default for all scenarios.
If returning True
on any error is a deliberate and safe design choice for all potential uses of this function, explicitly stating this rationale in a comment within the except
block or the docstring would improve clarity.
except Exception: | |
# If anything goes wrong, assume this is the first rank | |
return True | |
except Exception as e: # Capture the specific exception instance. | |
# Consider logging 'e' for debugging to understand potential failures. | |
# For example, if a logger is configured: | |
# logger.warning("is_global_first_rank() encountered an error, defaulting to True: %s", e, exc_info=True) | |
# If anything goes wrong, assume this is the first rank | |
return True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logging a debug message at least would be nice here
The trouble with using global first rank is that we now have cases where different ranks for the same deployment may run in different places, via separate CLI launches. For example multi-node DP. I'll try to think of how we can restrict this to running once per top-level This is still probably better than having it run in every rank though. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems like an improvement, even if there's more that could be done (based on nick's comment)
except Exception: | ||
# If anything goes wrong, assume this is the first rank | ||
return True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logging a debug message at least would be nice here
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: mgoin <mgoin64@gmail.com>
This pull request has merge conflicts that must be resolved before it can be |
Purpose
#19501 (comment)
Test Plan
Test Result
Before:
After: