Skip to content

Conversation

@comaniac
Copy link
Contributor

Why are these changes needed?

Support vLLM v1 in Ray Serve LLM.
Note that we need to merge the following PRs in vLLM:

It means that we cannot merge this PR due to the following blockers:

  1. The next vLLM release (0.8.1) that includes the above PRs.
  2. Upgrade vLLM version to the next release in Ray.
  3. Add unit tests against the next vLLM release.

Example Scripts:

  • launch.py
from ray import serve
from ray.serve.llm import LLMConfig, LLMServer, LLMRouter

llm_config = LLMConfig(
    model_loading_config=dict(
        model_id="qwen-0.5b",
        model_source="Qwen/Qwen2.5-0.5B-Instruct",
    ),
    deployment_config=dict(
        autoscaling_config=dict(
            initial_replica=1,
            min_replicas=1,
            max_replicas=2,
        )
    ),
    accelerator_type="L4",
    runtime_env=dict(
        env_vars=dict(
            # Enforce v1 is always required for Ray Serve.
            VLLM_USE_V1="1",
        )
    ),
)

# Deploy the application
deployment = LLMServer.as_deployment(llm_config.get_serve_options(name_prefix="vLLM:")).bind(llm_config)
llm_app = LLMRouter.as_deployment().bind([deployment])
serve.run(llm_app)
  • query.py
from openai import OpenAI

# Modify OpenAI's API key and API base to use vLLM's API server.
openai_api_key = "EMPTY"
openai_api_base = "http://localhost:8000/v1"

client = OpenAI(
    # defaults to os.environ.get("OPENAI_API_KEY")
    api_key=openai_api_key,
    base_url=openai_api_base,
)

models = client.models.list()
model = models.data[0].id

chat_completion = client.chat.completions.create(
    messages=[{
        "role": "system",
        "content": "You are a helpful assistant."
    }, {
        "role": "user",
        "content": "Who won the world series in 2020?"
    }, {
        "role":
        "assistant",
        "content":
        "The Los Angeles Dodgers won the World Series in 2020."
    }, {
        "role": "user",
        "content": "Where was it played?"
    }],
    model=model,
    max_tokens=150,
)

print("Chat completion results:")
print(chat_completion)

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Signed-off-by: Cody Yu <hao.yu.cody@gmail.com>
@comaniac comaniac requested a review from a team as a code owner March 18, 2025 23:56
Copy link
Member

@GeneDer GeneDer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Thanks Cody!

More just a note to self. We should follow up with updating dependency (when the latest vllm comes out), add release test, and docs. I can take those in the future when that's ready

@GeneDer GeneDer added the go add ONLY when ready to merge, run all tests label Mar 19, 2025
Signed-off-by: Cody Yu <hao.yu.cody@gmail.com>
@richardliaw
Copy link
Contributor

lmk when we can merge this

lk-chen added a commit to lk-chen/ray that referenced this pull request Mar 27, 2025
Signed-off-by: Linkun Chen <github@lkchen.net>
Copy link
Member

@GeneDer GeneDer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@comaniac
Copy link
Contributor Author

Note: If we merge #51726, which already includes the change in this PR, then we can close this one.

@richardliaw
Copy link
Contributor

Looks like we can close this one

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-backlog go add ONLY when ready to merge, run all tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants