Skip to content

[Bugfix] Fix faulty triton importing logic when using Ray for DP #19734

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

mgoin
Copy link
Member

@mgoin mgoin commented Jun 17, 2025

Purpose

FIX #19731

In the DPEngineCoreActor, Ray sets CUDA_VISIBLE_DEVICES to an empty string, and then the code deletes it to properly initialize data parallel groups.

vllm/vllm/v1/engine/core.py

Lines 922 to 925 in aed8468

# Ray sets CUDA_VISIBLE_DEVICES to empty string,
# we clean this up to be able to properly initialize
# data parallel groups.
del os.environ['CUDA_VISIBLE_DEVICES']

However, this happens after the Triton driver check has already run during the module import.
The key issue is the timing:

  1. When the Ray actor is created, the vllm.triton_utils.importing module is imported first
  2. During import, it runs the driver check when CUDA_VISIBLE_DEVICES is set to empty string by Ray
  3. This causes Triton to see 0 active drivers and disables Triton
  4. Later, the DPEngineCoreActor.__init__() method deletes the problematic CUDA_VISIBLE_DEVICES environment variable

Test Plan

Manually verify TP_SIZE=1 DP_SIZE=2 pytest -s -v "v1/test_async_llm_dp.py::test_load[ray-RequestOutputKind.DELTA]" works now.

Test Result

Passed!

Signed-off-by: mgoin <mgoin64@gmail.com>
@mgoin mgoin added the bug Something isn't working label Jun 17, 2025
Copy link

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @mgoin, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a critical bug that prevented Triton from being correctly initialized and utilized when vLLM is run in a data parallel setup using Ray. The fix introduces a more robust check for Triton driver availability, specifically accounting for how distributed environments like Ray temporarily modify CUDA_VISIBLE_DEVICES during actor initialization, ensuring Triton is properly enabled.

Highlights

  • Bugfix: Resolves a critical bug where Triton was incorrectly disabled during vLLM's initialization when running in a data parallel setup with Ray. This was caused by Ray temporarily setting CUDA_VISIBLE_DEVICES to an empty string, which the previous Triton import logic misinterpreted as a lack of active drivers.
  • Triton Import Logic Enhancement: The logic for checking Triton driver availability has been made more robust. It now specifically accounts for distributed environments (like Ray) where CUDA_VISIBLE_DEVICES might be temporarily empty during actor initialization, allowing Triton to correctly initialize in such scenarios.
  • Environment Variable Handling: Introduces a check for the CUDA_VISIBLE_DEVICES environment variable to identify if the process is operating within a distributed setup where temporary empty values are expected, thus preventing premature Triton disabling.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The pull request addresses a bug where Triton was being incorrectly disabled in distributed environments due to the timing of the CUDA_VISIBLE_DEVICES environment variable being set by Ray. The changes introduce a check for this specific scenario and adjust the driver check accordingly. I've suggested a minor readability improvement and a more descriptive debug message.

@mgoin mgoin requested review from njhill and ruisearch42 June 17, 2025 08:54
Copy link
Collaborator

@houseroad houseroad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fix the CI breakage. It's good to unblock using this change.

But I feel the logic is a bit too complicated here. Wondering if we should have some global state to coordinate status between different actors?

Copy link
Contributor

@maxdebayser maxdebayser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tests pass after I apply this change.

Copy link
Collaborator

@ruisearch42 ruisearch42 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix

Comment on lines +31 to +32
is_distributed_env = (cuda_visible_devices is not None
and len(cuda_visible_devices.strip()) == 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would something like is_ray_env be better here? Since presumably this will be False in non-ray distributed cases?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw, an alternative is to check Ray directly:

def is_in_ray_actor():

Copy link
Contributor

@yewentao256 yewentao256 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for figuring out and fixing this issue! It helps a lot

@WoosukKwon WoosukKwon added the ready ONLY add when PR is ready to merge/full CI is needed label Jun 17, 2025
@simon-mo simon-mo enabled auto-merge (squash) June 17, 2025 18:49
@simon-mo simon-mo merged commit b447624 into vllm-project:main Jun 17, 2025
78 checks passed
yeqcharlotte pushed a commit to yeqcharlotte/vllm that referenced this pull request Jun 22, 2025
minpeter pushed a commit to minpeter/vllm that referenced this pull request Jun 24, 2025
…m-project#19734)

Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: minpeter <kali2005611@gmail.com>
yangw-dev pushed a commit to yangw-dev/vllm that referenced this pull request Jun 24, 2025
…m-project#19734)

Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: Yang Wang <elainewy@meta.com>
gmarinho2 pushed a commit to gmarinho2/vllm that referenced this pull request Jun 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working ready ONLY add when PR is ready to merge/full CI is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[CI Failure]: Distributed Tests (2 GPUs) - v1/test_async_llm_dp.py::test_load
8 participants