Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[llm.serving] Fix using uni executor when world size == 1 #50849

Merged

Conversation

GeneDer
Copy link
Contributor

@GeneDer GeneDer commented Feb 24, 2025

Why are these changes needed?

When using vllm 0.7.2 with world size 1 (tp=1 and pp=1), vllm will force to use UniProcExecutor and cause No CUDA GPUs are available error. This PR forces to uses RayDistributedExecutor for any case so it will always be compatible with Ray.

Related issue number

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Signed-off-by: Gene Su <e870252314@gmail.com>
@GeneDer GeneDer requested a review from a team as a code owner February 24, 2025 06:16
@GeneDer
Copy link
Contributor Author

GeneDer commented Feb 24, 2025

@kouroshHakha this should fix the tp=1 throwing No CUDA GPUs are available error

@GeneDer GeneDer added the go add ONLY when ready to merge, run all tests label Feb 24, 2025
# uni processing executor when world_size is 1. This is a bug in vllm 0.7.2 and
# is fixed by https://github.com/vllm-project/vllm/pull/12934 which is shipped
# with vllm 0.7.3.
if engine_config.parallel_config.world_size == 1:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe check the version? we specify version constraint as >=0.7.2 not ==0.7.2, so user could be using either 0.7.2, 0.7.3 or even some future version.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nevermind, I guess world_size is the golden condition to use here. :)

maybe describe it a bit clearer in the comment on how the vllm version is related to world_size.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is specifically for 0.7.2. it's fixed in 0.7.3, but vllm pinned ray to 2.40.0 so that's not gonna work at least when ray 2.43.0 comes out

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why don't we always force using RayDistributedExecutor @GeneDer ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't that the case for any num_worker > 1?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kouroshHakha Doesn't that hurt performance based on our previous investigation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about the performance part, but we been using ray executor since when this is private unless user specific an executor to override

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so our prev investigation for tp=2 shows that using MQEngine is the reason for perf boost and not necessarily RayDistribuedExecutor. In tp=1 it might have different performance profile anyways. but since RayDistribuedExecutor handles these placement groups stuff internally well I think that has the most well-defined integration with ray serve. So using that seems more reasonable right now.

Signed-off-by: Gene Su <e870252314@gmail.com>
@kouroshHakha kouroshHakha enabled auto-merge (squash) February 24, 2025 20:07
@kouroshHakha kouroshHakha merged commit 2325ed9 into ray-project:master Feb 24, 2025
6 checks passed
GeneDer added a commit to GeneDer/ray that referenced this pull request Feb 24, 2025
aslonnie pushed a commit that referenced this pull request Feb 24, 2025
…50863)

Cherry-pick: #50849

Signed-off-by: Gene Su <e870252314@gmail.com>
@GeneDer GeneDer deleted the fix-using-forced-to-use-uni-executor branch February 24, 2025 20:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
go add ONLY when ready to merge, run all tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants