[llm] bump vllm to 0.9.0.1 #53443

lk-chen · 2025-05-30T17:38:38Z

Why are these changes needed?

Many important reasons

P/D disaggregation feature is added in 0.9.0
There's this dependency chain: vllm0.8.5 -> opentelemetry1.26.0 -> protobuf3, ray wants to upgrade to protobuf4

vllm0.9.0 allows opentelemetry 1.27, which accepts protobuf4

some users want vllm 0.9.0

Related issue number

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
- I've added any new APIs to the API Reference. For example, if I added a
  method in Tune, I've added it in doc/source/tune/api/ under the
  corresponding .rst file.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

eicherseiji · 2025-05-30T17:52:45Z

@lk-chen Can we change the import path here:

# python/ray/llm/_internal/serve/deployments/llm/vllm/vllm_engine.py
        if self.llm_config.log_engine_metrics:
            from ray.llm._internal.serve.deployments.llm.vllm.vllm_loggers import (
                RayPrometheusStatLogger,
            )

            # V1 AsyncLLMEngine does not yet support add_logger
            # For now, assume folks enabling log_engine_metrics do not require LoggingStatLogger, PrometheusStatLogger
            custom_stat_loggers = [RayPrometheusStatLogger]

From

            from ray.llm._internal.serve.deployments.llm.vllm.vllm_loggers import (

to

            from vllm.v1.metrics.ray_wrappers import (

and delete python/ray/llm/_internal/serve/deployments/llm/vllm/vllm_loggers.py?

release/llm_tests/serve/test_llm_serve_integration.py would have to be changed as well. Lmk if it's easier for me to just upload a patch.

eicherseiji · 2025-05-30T17:57:13Z

Taking this over

release/llm_tests/batch/test_batch_vllm.py

eicherseiji · 2025-06-04T00:43:57Z

Workarounds in this PR:

DeepSeek models (flashMLA) fail to load due to Ray setting CUDA_VISIBLE_DEVICES="", ephemerally
- vLLM PR: [Bugfix] Allow CUDA_VISIBLE_DEVICES='' in Platform.device_id_to_physical_device_id vllm-project/vllm#18979
- Workaround: clear CUDA_VISIBLE_DEVICES env variable before launching the vLLM engine
RayPrometheusStatLogger is broken upstream
- vLLM PR: Fix ValueError: Missing value for tag key(s): model_name,engine. vllm-project/vllm#19113
- Workaround: Keep our copy of RayPrometheusStatLogger in Ray until the vLLM PR lands/the next vLLM version bump.

kouroshHakha

Just a couple of qs before we merge.

python/ray/llm/_internal/serve/deployments/llm/vllm/vllm_engine.py

python/ray/llm/_internal/batch/processor/vllm_engine_proc.py

python/ray/llm/_internal/serve/deployments/llm/vllm/vllm_loggers.py

kouroshHakha · 2025-06-07T00:43:56Z

release/llm_tests/batch/test_batch_vllm.py

@@ -162,6 +167,17 @@ def test_vllm_llama_lora():
    assert all("resp" in out for out in outs)


+@ray.remote(num_gpus=1)
+def delete_torch_compile_cache_on_worker():


Does this mean that if I run the script using compiled graph twice in a row the second one will die because of this hythersis effect?

Yeah; I think pytest isn't letting the engine/pytorch clean up correctly between parameterizations so this is the workaround.

Signed-off-by: Linkun Chen <github@lkchen.net>

Signed-off-by: Seiji Eicher <seiji@anyscale.com>

lk-chen requested review from a team as code owners May 30, 2025 17:46

eicherseiji self-assigned this May 30, 2025

aslonnie approved these changes May 30, 2025

View reviewed changes

eicherseiji requested review from richardliaw and edoakes as code owners May 30, 2025 22:51

lk-chen changed the title ~~[llm] bump vllm to 0.9.0~~ [llm] bump vllm to 0.9.0.1 Jun 2, 2025

eicherseiji force-pushed the bump_vllm_090 branch from 44046e1 to db20f76 Compare June 3, 2025 00:38

eicherseiji added the go add ONLY when ready to merge, run all tests label Jun 3, 2025

lk-chen commented Jun 3, 2025

View reviewed changes

release/llm_tests/batch/test_batch_vllm.py Outdated Show resolved Hide resolved

eicherseiji requested review from zcin, GeneDer, akshay-anyscale, a team, pcmoritz and thomasdesr as code owners June 4, 2025 22:54

eicherseiji force-pushed the bump_vllm_090 branch from 96120c5 to 9d76dcd Compare June 4, 2025 22:56

eicherseiji removed request for a team and pcmoritz June 4, 2025 22:56

eicherseiji removed request for thomasdesr, GeneDer, zcin and akshay-anyscale June 4, 2025 22:56

kouroshHakha reviewed Jun 7, 2025

View reviewed changes

lk-chen and others added 17 commits June 8, 2025 08:41

bump 0.9.0

ecbd989

Signed-off-by: Linkun Chen <github@lkchen.net>

update req.txt

ca35ec4

Signed-off-by: Linkun Chen <github@lkchen.net>

clear todo

3213e3c

Signed-off-by: Linkun Chen <github@lkchen.net>

lint

924ee37

Signed-off-by: Linkun Chen <github@lkchen.net>

Bump to 0.9.0.1

b5371ca

Signed-off-by: Seiji Eicher <seiji@anyscale.com>

Test unsetting CUDA_VISIBLE_DEVICES if empty

5342902

Signed-off-by: Seiji Eicher <seiji@anyscale.com>

Force torch cache cleanup

a3786da

Signed-off-by: Seiji Eicher <seiji@anyscale.com>

Check existence

3bccc6d

Signed-off-by: Seiji Eicher <seiji@anyscale.com>

Delete cache on worker

e73908b

Signed-off-by: Seiji Eicher <seiji@anyscale.com>

Change test model

a907e00

Signed-off-by: Seiji Eicher <seiji@anyscale.com>

Delay upstream RayPrometheusLogger pending bugfix

b68f153

Signed-off-by: Seiji Eicher <seiji@anyscale.com>

Change VLM model to mistral-community/pixtral-12b

38e3812

Signed-off-by: Seiji Eicher <seiji@anyscale.com>

Rearrange imports

1895fbc

Signed-off-by: Seiji Eicher <seiji@anyscale.com>

Revert to Qwen, try to fix cache cleanup logic

906e82d

Signed-off-by: Seiji Eicher <seiji@anyscale.com>

Adapt P/D mock vLLM engine to 0.9.0.1 EngineClient

3e160b4

Signed-off-by: Seiji Eicher <seiji@anyscale.com>

Workaround until vllm-project/vllm/ray-project#18979 is merged

190af3f

Signed-off-by: Seiji Eicher <seiji@anyscale.com>

Improve comments for workarounds

2e162db

Signed-off-by: Seiji Eicher <seiji@anyscale.com>

eicherseiji force-pushed the bump_vllm_090 branch from 74cac5d to 2e162db Compare June 8, 2025 15:46

kouroshHakha approved these changes Jun 9, 2025

View reviewed changes

kouroshHakha merged commit a872286 into ray-project:master Jun 9, 2025
5 checks passed

lk-chen mentioned this pull request Jun 16, 2025

Bump xgrammar from 0.1.16 to 0.1.18 in /python #52175

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[llm] bump vllm to 0.9.0.1 #53443

[llm] bump vllm to 0.9.0.1 #53443

Uh oh!

lk-chen commented May 30, 2025

Uh oh!

eicherseiji commented May 30, 2025 •

edited

Loading

Uh oh!

eicherseiji commented May 30, 2025

Uh oh!

Uh oh!

eicherseiji commented Jun 4, 2025

Uh oh!

kouroshHakha left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kouroshHakha Jun 7, 2025

Uh oh!

eicherseiji Jun 8, 2025

Uh oh!

Uh oh!

Uh oh!

[llm] bump vllm to 0.9.0.1 #53443

[llm] bump vllm to 0.9.0.1 #53443

Uh oh!

Conversation

lk-chen commented May 30, 2025

Why are these changes needed?

Related issue number

Checks

Uh oh!

eicherseiji commented May 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

eicherseiji commented May 30, 2025

Uh oh!

Uh oh!

eicherseiji commented Jun 4, 2025

Uh oh!

kouroshHakha left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kouroshHakha Jun 7, 2025

Choose a reason for hiding this comment

Uh oh!

eicherseiji Jun 8, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

eicherseiji commented May 30, 2025 •

edited

Loading