-
-
Notifications
You must be signed in to change notification settings - Fork 8.4k
[Bugfix] Allow CUDA_VISIBLE_DEVICES=''
in Platform.device_id_to_physical_device_id
#18979
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
This pull request has merge conflicts that must be resolved before it can be |
5e494c3
to
2cdb3a9
Compare
@njhill your feedback on this would be greatly appreciated. Thanks! |
@eicherseiji Let's add some unittest that prevents this from getting changed again down the line. Basically EngineCoreProc / AsyncLLMEngine should be instantiable on a cpu head node. |
@LucasWilkinson can you take a look? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @eicherseiji! Would be good for @LucasWilkinson to check this too, and I agree with @kouroshHakha that having a test to cover this case would be good.
Sorry still OOO so doing this on my phone (responses may be delayed / I may mis things). My main comment would be to check V0; this logic is shared between V0 and V1 (in the original implementation) |
602030b
to
7d4ccaa
Compare
Thanks @LucasWilkinson, @njhill, @kouroshHakha for the review! I added a test, and moved another recent change that introduced platform dependency ( The existing paths work fine for V0 engines on Ray, so I'm leaving those be with TODOs to remove after V0 deprecation. Please let me know any comments or concerns. Thanks :) |
7d4ccaa
to
b3e5367
Compare
Reviewing CI failures |
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
…ix quantize config validation error Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
3def0be
to
66bd1fb
Compare
Rebased but latest nightly is pretty red: https://buildkite.com/vllm/ci/builds/22564#_ |
…ysical_device_id` (vllm-project#18979) Signed-off-by: Seiji Eicher <seiji@anyscale.com>
…ysical_device_id` (vllm-project#18979) Signed-off-by: Seiji Eicher <seiji@anyscale.com>
…ysical_device_id` (vllm-project#18979) Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Since #15977,
EngineCoreProc
initiates the following callstack:->
VllmConfig.__post_init__
->
CudaPlatformBase.check_and_update_config
->
is_flashmla_supported
->
NvmlCudaPlatform.get_device_capability
This causes a crash when an
AsyncLLMEngine
is created on a CPU-only head node with GPU worker nodes, sinceCUDA_VISIBLE_DEVICES=""
on the head.I think this is a regression, because in 0.8.5 it was not assumed that
EngineCoreProc
knew the device capability of its workers.In a Ray setup, it's expected that the device control environment variable, like CUDA_VISIBLE_DEVICES, is set to the empty string when launching the vLLM engine. The process will still be colocated with the GPU workers on a GPU node (thus current_platform is set correctly), but the front-end process should not consume any CPU resources.
Thus, if CUDA platform was detected, we should make it possible to query general device capability, e.g. current_platform.get_device_capability() (which has default kwarg device_id=0), even if the devices aren't exposed via the environment variable.
This addresses the FlashMLA capability check regression because current_platform.get_device_capability() no longer returns None in check_and_update_config.
The change adds two tests:
Ensure that CoreEngineProc can be instantiated withCUDA_VISIBLE_DEVICES=''
Ensure that configs created with CUDA_VISIBLE_DEVICES=''and CUDA_VISIBLE_DEVICES set normally are identical
Reproducer
python serve.py
:Logs: