-
Notifications
You must be signed in to change notification settings - Fork 6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[llm.serving] Fix using uni executor when world size == 1 #50849
[llm.serving] Fix using uni executor when world size == 1 #50849
Conversation
Signed-off-by: Gene Su <e870252314@gmail.com>
@kouroshHakha this should fix the tp=1 throwing |
# uni processing executor when world_size is 1. This is a bug in vllm 0.7.2 and | ||
# is fixed by https://github.com/vllm-project/vllm/pull/12934 which is shipped | ||
# with vllm 0.7.3. | ||
if engine_config.parallel_config.world_size == 1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe check the version? we specify version constraint as >=0.7.2
not ==0.7.2
, so user could be using either 0.7.2, 0.7.3 or even some future version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nevermind, I guess world_size
is the golden condition to use here. :)
maybe describe it a bit clearer in the comment on how the vllm version is related to world_size
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is specifically for 0.7.2. it's fixed in 0.7.3, but vllm pinned ray to 2.40.0 so that's not gonna work at least when ray 2.43.0 comes out
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why don't we always force using RayDistributedExecutor
@GeneDer ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't that the case for any num_worker > 1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kouroshHakha Doesn't that hurt performance based on our previous investigation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure about the performance part, but we been using ray executor since when this is private unless user specific an executor to override
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so our prev investigation for tp=2 shows that using MQEngine is the reason for perf boost and not necessarily RayDistribuedExecutor. In tp=1 it might have different performance profile anyways. but since RayDistribuedExecutor handles these placement groups stuff internally well I think that has the most well-defined integration with ray serve. So using that seems more reasonable right now.
Signed-off-by: Gene Su <e870252314@gmail.com>
…t#50849) Signed-off-by: Gene Su <e870252314@gmail.com>
Why are these changes needed?
When using vllm 0.7.2 with world size 1 (tp=1 and pp=1), vllm will force to use
UniProcExecutor
and causeNo CUDA GPUs are available
error. This PR forces to usesRayDistributedExecutor
for any case so it will always be compatible with Ray.Related issue number
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.