Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[llm.serving] Fix using uni executor when world size == 1 #50849

Merged
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -191,9 +191,18 @@ def __init__(self, ipc_path, engine_args, engine_config):
# Adapted from vllm.engine.multiprocessing.engine.MQLLMEngine.from_engine_args
vllm.plugins.load_general_plugins()

executor_class = vllm.engine.llm_engine.LLMEngine._get_executor_cls(
engine_config
)
# Note (genesu): This is a temporary fix to avoid vllm 0.7.2 forced the use of
# uni processing executor when world_size is 1. This is a bug in vllm 0.7.2 and
# is fixed by https://github.com/vllm-project/vllm/pull/12934 which is shipped
# with vllm 0.7.3.
if engine_config.parallel_config.world_size == 1:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe check the version? we specify version constraint as >=0.7.2 not ==0.7.2, so user could be using either 0.7.2, 0.7.3 or even some future version.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nevermind, I guess world_size is the golden condition to use here. :)

maybe describe it a bit clearer in the comment on how the vllm version is related to world_size.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is specifically for 0.7.2. it's fixed in 0.7.3, but vllm pinned ray to 2.40.0 so that's not gonna work at least when ray 2.43.0 comes out

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why don't we always force using RayDistributedExecutor @GeneDer ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't that the case for any num_worker > 1?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kouroshHakha Doesn't that hurt performance based on our previous investigation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about the performance part, but we been using ray executor since when this is private unless user specific an executor to override

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so our prev investigation for tp=2 shows that using MQEngine is the reason for perf boost and not necessarily RayDistribuedExecutor. In tp=1 it might have different performance profile anyways. but since RayDistribuedExecutor handles these placement groups stuff internally well I think that has the most well-defined integration with ray serve. So using that seems more reasonable right now.

from vllm.executor.ray_distributed_executor import RayDistributedExecutor

executor_class = RayDistributedExecutor
else:
executor_class = vllm.engine.llm_engine.LLMEngine._get_executor_cls(
engine_config
)

self.engine = MQLLMEngine(
ipc_path=ipc_path,
Expand Down