-
-
Notifications
You must be signed in to change notification settings - Fork 11.2k
Fix v1 supported oracle for worker-cls and worker-extension-cls #15324
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
48a4da8
effacb0
2d1e722
80626e5
df8a73e
55075bf
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -1460,16 +1460,6 @@ def _is_v1_supported_oracle(self, model_config: ModelConfig) -> bool: | |
| recommend_to_remove=False) | ||
| return False | ||
|
|
||
| if self.worker_cls != EngineArgs.worker_cls: | ||
| _raise_or_fallback(feature_name="--worker-cls", | ||
| recommend_to_remove=False) | ||
| return False | ||
|
|
||
| if self.worker_extension_cls != EngineArgs.worker_extension_cls: | ||
| _raise_or_fallback(feature_name="--worker-extension-cls", | ||
| recommend_to_remove=False) | ||
| return False | ||
|
|
||
|
Comment on lines
-1463
to
-1472
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. cc @robertgshaw2-redhat for visibility. |
||
| if self.num_scheduler_steps != EngineArgs.num_scheduler_steps: | ||
| _raise_or_fallback(feature_name="--num-scheduler-steps", | ||
| recommend_to_remove=True) | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -2170,6 +2170,11 @@ def _maybe_force_spawn(): | |
| if cuda_is_initialized(): | ||
| reason = "CUDA is initialized" | ||
| elif is_in_ray_actor(): | ||
| # even if we choose to spawn, we need to pass the ray address | ||
| # to the subprocess so that it knows how to connect to the ray cluster. | ||
| # env vars are inherited by subprocesses, even if we use spawn. | ||
| import ray | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this is guarded under
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah that's right, realized it afterwards |
||
| os.environ["RAY_ADDRESS"] = ray.get_runtime_context().gcs_address | ||
| reason = "In a Ray actor and can only be spawned" | ||
|
|
||
| if reason is not None: | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in v1,
LLM.collective_rpcis broken in the default case, becauseself.llm_engine.model_executoris in a different process.cc @njhill @robertgshaw2-redhat if you can help fix it. i'm disabling
VLLM_ENABLE_V1_MULTIPROCESSINGfor the test right now.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think we should fall back to this automatically if the user selects
--worker-cls?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't need to fall back.
worker-clsandrlhfcan be separate things.