[Misc] allow pulling vllm in Ray runtime environment #21143

eric-higgins-ai · 2025-07-17T21:31:13Z

Purpose

The engine is run in a spawned subprocess, which Ray interprets as a new job with its own runtime environment. This means that vllm can't be pulled through the Ray runtime environment, as we don't pass the original job's runtime env through to the subprocess.

This issue was reported here.

Test Plan

Ran a Ray job with the following code

vision_processor_config = vLLMEngineProcessorConfig(
        model="Qwen/Qwen2.5-VL-32B-Instruct",
        engine_kwargs=dict(
            tensor_parallel_size=1,  
            pipeline_parallel_size=NUMBER_OF_GPUS,
            max_model_len=4096,
            enable_chunked_prefill=True,
            max_num_batched_tokens=2048,
            distributed_executor_backend="ray",
            device="cuda",
        ),
        # Override Ray's runtime env to include the Hugging Face token. Ray Data uses Ray under the hood to orchestrate the inference pipeline.
        runtime_env=dict(
            env_vars=dict(
                HF_TOKEN="<token>",
                VLLM_USE_V1="1",
            ),
        ),
        batch_size=1,
        concurrency=1,
        has_image=False
    )
    
    #build the processor
    processor = build_llm_processor(
        vision_processor_config,
        preprocess=lambda row: dict(
            messages=[
                {"role": "system", "content": "You are a bot that responds with haikus."},
                {"role": "user", "content": row["item"]}
            ],
            sampling_params=dict(
                temperature=0.3,
                max_tokens=250,
            )
        ),
        postprocess=lambda row: dict(
            answer=row["generated_text"],
            **row  # This will return all the original columns in the dataset.
        ),
    )

    #create the dataset
    ds = ray.data.from_items(["Start of the haiku is: Complete this for me..."])
    ds = processor(ds)
    ds.show(limit=1)

Test Result

I checked in the Ray dashboard that the launched job has the runtime env provided in the engine_kwargs.

github-actions · 2025-07-17T21:31:21Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

gemini-code-assist

Code Review

This pull request enables the propagation of a Ray runtime environment to vLLM's distributed workers. This is a useful feature when vLLM is used as a component within a larger Ray application that defines a specific runtime environment.

The changes are well-targeted:

The ParallelConfig is extended to hold an optional runtime_env.
When creating the engine configuration inside a Ray actor, the current runtime_env is fetched from the Ray context and stored in the ParallelConfig.
When the Ray executor initializes the Ray cluster, it now passes this runtime_env to ray.init(), ensuring that subsequently created workers inherit the correct environment.

I've reviewed the implementation, and the logic appears sound and correctly handles the cases where Ray is already initialized versus when vLLM needs to initialize it. The changes are constrained to the Ray execution path and should not affect other backends. Overall, this is a good addition to improve vLLM's integration with the Ray ecosystem.

Signed-off-by: eric-higgins-ai <erichiggins@applied.co>

eric-higgins-ai requested review from simon-mo, WoosukKwon, youkaichao, robertgshaw2-redhat, mgoin, tlrmchlsmth, houseroad and hmellor as code owners July 17, 2025 21:31

gemini-code-assist bot reviewed Jul 17, 2025

View reviewed changes

eric-higgins-ai force-pushed the main branch from 7fab730 to cd6f791 Compare July 17, 2025 21:32

[Misc] pass Ray runtime env to engine core

fb38d2b

Signed-off-by: eric-higgins-ai <erichiggins@applied.co>

eric-higgins-ai force-pushed the main branch from cd6f791 to fb38d2b Compare July 17, 2025 21:34

simon-mo assigned ruisearch42 Jul 17, 2025

eric-higgins-ai added 2 commits July 17, 2025 14:58

fix pre-commit

e098713

Signed-off-by: eric-higgins-ai <erichiggins@applied.co>

oops

5cb10cb

Signed-off-by: eric-higgins-ai <erichiggins@applied.co>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Misc] allow pulling vllm in Ray runtime environment #21143

[Misc] allow pulling vllm in Ray runtime environment #21143

eric-higgins-ai commented Jul 17, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Jul 17, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

[Misc] allow pulling vllm in Ray runtime environment #21143

Are you sure you want to change the base?

[Misc] allow pulling vllm in Ray runtime environment #21143

Conversation

eric-higgins-ai commented Jul 17, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

github-actions bot commented Jul 17, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

eric-higgins-ai commented Jul 17, 2025 •

edited by github-actions bot

Loading