[llm.serving] Fix using uni executor when world size == 1 #50849

GeneDer · 2025-02-24T06:16:50Z

Why are these changes needed?

When using vllm 0.7.2 with world size 1 (tp=1 and pp=1), vllm will force to use UniProcExecutor and cause No CUDA GPUs are available error. This PR forces to uses RayDistributedExecutor for any case so it will always be compatible with Ray.

Related issue number

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
- I've added any new APIs to the API Reference. For example, if I added a
  method in Tune, I've added it in doc/source/tune/api/ under the
  corresponding .rst file.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

Signed-off-by: Gene Su <e870252314@gmail.com>

GeneDer · 2025-02-24T06:17:25Z

@kouroshHakha this should fix the tp=1 throwing No CUDA GPUs are available error

aslonnie · 2025-02-24T16:41:04Z

python/ray/llm/_internal/serve/deployments/llm/vllm/vllm_engine.py

+        # uni processing executor when world_size is 1. This is a bug in vllm 0.7.2 and
+        # is fixed by https://github.com/vllm-project/vllm/pull/12934 which is shipped
+        # with vllm 0.7.3.
+        if engine_config.parallel_config.world_size == 1:


maybe check the version? we specify version constraint as >=0.7.2 not ==0.7.2, so user could be using either 0.7.2, 0.7.3 or even some future version.

nevermind, I guess world_size is the golden condition to use here. :)

maybe describe it a bit clearer in the comment on how the vllm version is related to world_size.

This is specifically for 0.7.2. it's fixed in 0.7.3, but vllm pinned ray to 2.40.0 so that's not gonna work at least when ray 2.43.0 comes out

why don't we always force using RayDistributedExecutor @GeneDer ?

isn't that the case for any num_worker > 1?

@kouroshHakha Doesn't that hurt performance based on our previous investigation?

Not sure about the performance part, but we been using ray executor since when this is private unless user specific an executor to override

so our prev investigation for tp=2 shows that using MQEngine is the reason for perf boost and not necessarily RayDistribuedExecutor. In tp=1 it might have different performance profile anyways. but since RayDistribuedExecutor handles these placement groups stuff internally well I think that has the most well-defined integration with ray serve. So using that seems more reasonable right now.

Signed-off-by: Gene Su <e870252314@gmail.com>

…t#50849) Signed-off-by: Gene Su <e870252314@gmail.com>

…50863) Cherry-pick: #50849 Signed-off-by: Gene Su <e870252314@gmail.com>

[llm.serving] Fix using uni executor when world size == 1

cd06e16

Signed-off-by: Gene Su <e870252314@gmail.com>

GeneDer requested a review from a team as a code owner February 24, 2025 06:16

GeneDer added the go add ONLY when ready to merge, run all tests label Feb 24, 2025

aslonnie reviewed Feb 24, 2025

View reviewed changes

enforce using RayDistributedExecutor

2e8d1a5

Signed-off-by: Gene Su <e870252314@gmail.com>

kouroshHakha enabled auto-merge (squash) February 24, 2025 20:07

kouroshHakha approved these changes Feb 24, 2025

View reviewed changes

kouroshHakha merged commit 2325ed9 into ray-project:master Feb 24, 2025
6 checks passed

GeneDer added a commit to GeneDer/ray that referenced this pull request Feb 24, 2025

[llm.serving] Fix using uni executor when world size == 1 (ray-projec…

063fddf

…t#50849) Signed-off-by: Gene Su <e870252314@gmail.com>

GeneDer mentioned this pull request Feb 24, 2025

[llm.serving] Fix using uni executor when world size == 1 (#50849) #50863

Merged

8 tasks

aslonnie pushed a commit that referenced this pull request Feb 24, 2025

[llm.serving] Fix using uni executor when world size == 1 (#50849) (#…

ecd0709

…50863) Cherry-pick: #50849 Signed-off-by: Gene Su <e870252314@gmail.com>

GeneDer deleted the fix-using-forced-to-use-uni-executor branch February 24, 2025 20:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[llm.serving] Fix using uni executor when world size == 1 #50849

[llm.serving] Fix using uni executor when world size == 1 #50849

GeneDer commented Feb 24, 2025 •

edited

Loading

GeneDer commented Feb 24, 2025

aslonnie Feb 24, 2025

aslonnie Feb 24, 2025

GeneDer Feb 24, 2025

kouroshHakha Feb 24, 2025

kouroshHakha Feb 24, 2025

comaniac Feb 24, 2025

GeneDer Feb 24, 2025

kouroshHakha Feb 24, 2025

[llm.serving] Fix using uni executor when world size == 1 #50849

[llm.serving] Fix using uni executor when world size == 1 #50849

Conversation

GeneDer commented Feb 24, 2025 • edited Loading

Why are these changes needed?

Related issue number

Checks

GeneDer commented Feb 24, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

GeneDer commented Feb 24, 2025 •

edited

Loading