[rlhf] support named placement group #14410

youkaichao · 2025-03-07T06:28:21Z

v1 engine will create the engine in a separate process, and controlling the process placement becomes complicated.

suggested by the ray team, we can use https://docs.ray.io/en/latest/ray-core/scheduling/placement-group.html#advanced-named-placement-group , and pass the ray placement group name by an env var.

however, I find the named-placement-group does not work right now. it hangs when looking up the placement group.

steps to reproduce the hang:

$ cd examples/offline_inference
$ VLLM_USE_V1=1 python3 rlhf.py

it hangs when we try to get the placement group, i.e. the added line:

        current_placement_group = ray.util.get_placement_group(
            envs.VLLM_RAY_PG_NAME)

Signed-off-by: youkaichao <youkaichao@gmail.com>

github-actions · 2025-03-07T06:28:30Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

youkaichao · 2025-03-07T06:39:15Z

without this pr, VLLM_USE_V1=1 python3 rlhf.py will hang in

vllm/vllm/executor/ray_utils.py

Line 310 in 8ca7a71

bundles = current_placement_group.bundle_specs

underlying hang is:

https://github.com/ray-project/ray/blob/8b26f5c13b7c66fb4e19094f9870083261ed9672/python/ray/util/placement_group.py#L139

youkaichao · 2025-03-07T06:52:59Z

right now, before the problem is fixed, for RLHF users, they need to use VLLM_USE_V1=1 VLLM_ENABLE_V1_MULTIPROCESSING=0 python3 rlhf.py .

youkaichao · 2025-03-27T14:38:37Z

close as it is finished in #14705

youkaichao added 5 commits March 7, 2025 13:33

use named pg

f5bf505

Signed-off-by: youkaichao <youkaichao@gmail.com>

use named pg

238292c

Signed-off-by: youkaichao <youkaichao@gmail.com>

fix?

8b91ef5

Signed-off-by: youkaichao <youkaichao@gmail.com>

log

7784d75

Signed-off-by: youkaichao <youkaichao@gmail.com>

try to fix

9fbe6ff

Signed-off-by: youkaichao <youkaichao@gmail.com>

mergify bot added the documentation Improvements or additions to documentation label Mar 7, 2025

youkaichao mentioned this pull request Mar 13, 2025

[Misc] Better RayExecutor and multiprocessing compatibility #14705

Merged

ruisearch42 mentioned this pull request Mar 13, 2025

[misc] Spawn process when vLLM is used as Ray actor #14768

Closed

youkaichao closed this Mar 27, 2025

youkaichao deleted the ray_named branch March 27, 2025 14:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[rlhf] support named placement group #14410

[rlhf] support named placement group #14410

Uh oh!

youkaichao commented Mar 7, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Mar 7, 2025

Uh oh!

youkaichao commented Mar 7, 2025

Uh oh!

youkaichao commented Mar 7, 2025

Uh oh!

youkaichao commented Mar 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

[rlhf] support named placement group #14410

[rlhf] support named placement group #14410

Uh oh!

Conversation

youkaichao commented Mar 7, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Mar 7, 2025

Uh oh!

youkaichao commented Mar 7, 2025

Uh oh!

youkaichao commented Mar 7, 2025

Uh oh!

youkaichao commented Mar 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

youkaichao commented Mar 7, 2025 •

edited by github-actions bot

Loading