[ray.serve.llm] Support vLLM v1 #51490

comaniac · 2025-03-18T23:56:44Z

Why are these changes needed?

Support vLLM v1 in Ray Serve LLM.
Note that we need to merge the following PRs in vLLM:

It means that we cannot merge this PR due to the following blockers:

The next vLLM release (0.8.1) that includes the above PRs.
Upgrade vLLM version to the next release in Ray.
Add unit tests against the next vLLM release.

Example Scripts:

launch.py

from ray import serve
from ray.serve.llm import LLMConfig, LLMServer, LLMRouter

llm_config = LLMConfig(
    model_loading_config=dict(
        model_id="qwen-0.5b",
        model_source="Qwen/Qwen2.5-0.5B-Instruct",
    ),
    deployment_config=dict(
        autoscaling_config=dict(
            initial_replica=1,
            min_replicas=1,
            max_replicas=2,
        )
    ),
    accelerator_type="L4",
    runtime_env=dict(
        env_vars=dict(
            # Enforce v1 is always required for Ray Serve.
            VLLM_USE_V1="1",
        )
    ),
)

# Deploy the application
deployment = LLMServer.as_deployment(llm_config.get_serve_options(name_prefix="vLLM:")).bind(llm_config)
llm_app = LLMRouter.as_deployment().bind([deployment])
serve.run(llm_app)

query.py

from openai import OpenAI

# Modify OpenAI's API key and API base to use vLLM's API server.
openai_api_key = "EMPTY"
openai_api_base = "http://localhost:8000/v1"

client = OpenAI(
    # defaults to os.environ.get("OPENAI_API_KEY")
    api_key=openai_api_key,
    base_url=openai_api_base,
)

models = client.models.list()
model = models.data[0].id

chat_completion = client.chat.completions.create(
    messages=[{
        "role": "system",
        "content": "You are a helpful assistant."
    }, {
        "role": "user",
        "content": "Who won the world series in 2020?"
    }, {
        "role":
        "assistant",
        "content":
        "The Los Angeles Dodgers won the World Series in 2020."
    }, {
        "role": "user",
        "content": "Where was it played?"
    }],
    model=model,
    max_tokens=150,
)

print("Chat completion results:")
print(chat_completion)

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
- I've added any new APIs to the API Reference. For example, if I added a
  method in Tune, I've added it in doc/source/tune/api/ under the
  corresponding .rst file.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

Signed-off-by: Cody Yu <hao.yu.cody@gmail.com>

GeneDer

Looks great! Thanks Cody!

More just a note to self. We should follow up with updating dependency (when the latest vllm comes out), add release test, and docs. I can take those in the future when that's ready

Signed-off-by: Cody Yu <hao.yu.cody@gmail.com>

richardliaw · 2025-03-22T22:20:33Z

lmk when we can merge this

Signed-off-by: Linkun Chen <github@lkchen.net>

GeneDer

comaniac · 2025-03-28T22:36:54Z

Note: If we merge #51726, which already includes the change in this PR, then we can close this one.

richardliaw · 2025-04-18T21:09:43Z

Looks like we can close this one

done

1fa4607

Signed-off-by: Cody Yu <hao.yu.cody@gmail.com>

comaniac requested a review from a team as a code owner March 18, 2025 23:56

comaniac assigned GeneDer and kouroshHakha Mar 18, 2025

GeneDer reviewed Mar 19, 2025

View reviewed changes

GeneDer added the go add ONLY when ready to merge, run all tests label Mar 19, 2025

fix

52d9254

Signed-off-by: Cody Yu <hao.yu.cody@gmail.com>

richardliaw approved these changes Mar 22, 2025

View reviewed changes

kouroshHakha mentioned this pull request Mar 26, 2025

[Data][LLM] Bump vLLM version to support new models #51726

Merged

8 tasks

lk-chen added a commit to lk-chen/ray that referenced this pull request Mar 27, 2025

merge ray-project#51490

ebc4b15

Signed-off-by: Linkun Chen <github@lkchen.net>

GeneDer approved these changes Mar 28, 2025

View reviewed changes

richardliaw closed this Apr 18, 2025

hainesmichaelc added the community-backlog label May 22, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ray.serve.llm] Support vLLM v1 #51490

[ray.serve.llm] Support vLLM v1 #51490

Uh oh!

comaniac commented Mar 18, 2025

Uh oh!

GeneDer left a comment

Uh oh!

richardliaw commented Mar 22, 2025

Uh oh!

GeneDer left a comment

Uh oh!

comaniac commented Mar 28, 2025

Uh oh!

richardliaw commented Apr 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[ray.serve.llm] Support vLLM v1 #51490

[ray.serve.llm] Support vLLM v1 #51490

Uh oh!

Conversation

comaniac commented Mar 18, 2025

Why are these changes needed?

Checks

Uh oh!

GeneDer left a comment

Choose a reason for hiding this comment

Uh oh!

richardliaw commented Mar 22, 2025

Uh oh!

GeneDer left a comment

Choose a reason for hiding this comment

Uh oh!

comaniac commented Mar 28, 2025

Uh oh!

richardliaw commented Apr 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants