Skip to content

Upgrade vLLM to 0.17.0#61598

Open
jeffreywang-anyscale wants to merge 17 commits intomasterfrom
vllm-0.17.0
Open

Upgrade vLLM to 0.17.0#61598
jeffreywang-anyscale wants to merge 17 commits intomasterfrom
vllm-0.17.0

Conversation

@jeffreywang-anyscale
Copy link
Contributor

Description

Briefly describe what this PR accomplishes and why it's needed.

Related issues

Link related issues: "Fixes #1234", "Closes #1234", or "Related to #1234".

Additional information

Optional: Add implementation details, API changes, usage examples, screenshots, etc.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly upgrades vLLM to version 0.17.0 and updates its dependencies accordingly. The code changes are consistent with this upgrade. However, it seems a local configuration for a PyPI index has been accidentally included in many of the dependency lock files. This should be removed to avoid breaking builds.

Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>
@jeffreywang-anyscale jeffreywang-anyscale added the go add ONLY when ready to merge, run all tests label Mar 9, 2026
@ray-gardener ray-gardener bot added serve Ray Serve Related Issue llm labels Mar 9, 2026
opentelemetry-proto==1.39.0 \
--hash=sha256:1e086552ac79acb501485ff0ce75533f70f3382d43d0a30728eeee594f7bf818 \
--hash=sha256:c1fa48678ad1a1624258698e59be73f990b7fc1f39e73e16a9d08eef65dd838c
opentelemetry-proto==1.34.1 \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this downgraded?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is compiled from https://github.com/ray-project/ray/blob/15a473454084a739264ce66290d7d4fc1b3926b4/python/requirements/serve/tracing-reqs.txt + all opentelemetry libraries should have the same version.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm... that does not really make sense to me.

  1. how is the tracing-reqs.txt get pulled in with this change?
  2. should that be upgraded to 1.39.0 for consistency?

I think it is that some additional dependency of vllm 0.17 is pulling the version down

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@elliot-barn could you help investigate? like what will happen if we enforce opentelemetry-proto>=1.39.0 as a constraint?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

 opentelemetry-proto 1.40.0 depends on protobuf<7.0 and >=5.0
  opentelemetry-proto 1.39.1 depends on protobuf<7.0 and >=5.0
  opentelemetry-proto 1.39.0 depends on protobuf<7.0 and >=5.0

we are still on 4.25.8 and py313 dependency upgrade initiative will bring us to 5.29.6

Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>
Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>
Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>
@jeffreywang-anyscale jeffreywang-anyscale requested a review from a team as a code owner March 9, 2026 23:21
Copy link
Collaborator

@aslonnie aslonnie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to understand more on why opentelemetry-proto needs to be downgraded.

--hash=sha256:c1fa48678ad1a1624258698e59be73f990b7fc1f39e73e16a9d08eef65dd838c
opentelemetry-proto==1.34.1 \
--hash=sha256:16286214e405c211fc774187f3e4bbb1351290b8dfb88e8948af209ce85b719e \
--hash=sha256:eb4bb5ac27f2562df2d6857fc557b3a481b5e298bc04f94cc68041f00cebcbd2
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unintended opentelemetry-proto downgrade across non-LLM lock files

Medium Severity

opentelemetry-proto is downgraded from 1.39.0 to 1.34.1 not only in LLM lock files but also in unrelated base, slim, and serve lock files that have no dependency on vLLM. As the PR reviewers noted, a vLLM upgrade should not pull down opentelemetry-proto in non-LLM environments. This broad, unexplained regression may indicate a dependency resolution issue — possibly a transitive constraint leaking through shared constraint files — and risks incompatibilities with components expecting the newer proto version.

Additional Locations (2)
Fix in Cursor Fix in Web

Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>
- --python-version=${PYTHON_VERSION_STR}
- --unsafe-package ray
- --python-platform=x86_64-manylinux_2_31
- --index https://download.pytorch.org/whl/${CUDA_CODE}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PyTorch CUDA index removed from LLM build configs

High Severity

The --index https://download.pytorch.org/whl/${CUDA_CODE} flag was removed from LLM depset configs that build for CUDA targets (cu128, cu130_py312). Other GPU configs in the same repo (ray_base_extra_testdeps_gpu, ray_ml_base_extra_testdeps_cuda, release_multimodal_inference_benchmarks_tests) still use --index https://download.pytorch.org/whl/cu128. vLLM 0.17.0 docs still recommend --extra-index-url for the PyTorch CUDA wheel index. Without this index, the uv resolver may not find CUDA-specific PyTorch wheels for cu128/cu130 builds, potentially resolving CPU-only PyTorch when lock files are regenerated.

Additional Locations (2)
Fix in Cursor Fix in Web

1. Restore `--index https://download.pytorch.org/whl/${CUDA_CODE}` in
   rayllm.depsets.yaml. This was accidentally dropped, causing cu128
   lockfiles to resolve torchaudio from PyPI instead of the CUDA index.

2. Add numexpr>=2.10 to llm-test-requirements.txt. The CI base Docker
   image has numexpr compiled against NumPy 1.x, but the lockfile
   installs NumPy 2.x, causing a binary incompatibility crash. Including
   numexpr in the lockfile ensures a compatible version overwrites the
   base image's broken one.

3. Regenerate all 16 LLM lockfiles.

Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>
Master also lacks numexpr in lockfiles with the same NumPy 2.2.6.
The CPU test numexpr failure is a base image issue, not caused by
this branch.

Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>
Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>
Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>

uv pip install --system --no-cache-dir --no-deps \
--index-strategy unsafe-best-match \
--index-strategy first-index \
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Index strategy inconsistency may break Docker image builds

Medium Severity

The --index-strategy was changed from unsafe-best-match to first-index, inconsistent with every other Dockerfile in the project (base-deps, base-extra, base-slim), which all use unsafe-best-match with an explicit comment explaining its necessity. The lock file rayllm_py311_cu128.lock is compiled using the PyTorch index (via rayllm.depsets.yaml which still has --index https://download.pytorch.org/whl/${CUDA_CODE}), so hashes may originate from that index. With first-index, uv stops at PyPI for packages like torch==2.10.0 and torchvision==0.25.0 and never checks the PyTorch index — if any hashes don't match PyPI's wheels, the build will fail.

Fix in Cursor Fix in Web

Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>
Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>
@jeffreywang-anyscale jeffreywang-anyscale requested a review from a team as a code owner March 14, 2026 00:03
- ray_img_depset_${PYTHON_SHORT}
output: python/deplocks/base_extra_testdeps/ray-llm-base_extra_testdeps_py${PYTHON_VERSION}.lock
append_flags:
- --index https://download.pytorch.org/whl/${CUDA_CODE}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PyTorch CUDA index removed inconsistently from CUDA depsets

Medium Severity

The --index https://download.pytorch.org/whl/${CUDA_CODE} flag was removed from the ray_base_extra_testdeps_llm_cuda depset in rayimg.depsets.yaml and from the common settings in llm_release_tests.depsets.yaml, but similar CUDA depsets like ray_base_extra_testdeps_gpu and ray_ml_base_extra_testdeps_cuda in the same file still retain it. The LLM CUDA depset expands dependencies that include sentence-transformers, which depends on PyTorch — without the CUDA wheel index, the resolver may pull CPU-only PyTorch wheels for what's explicitly a CUDA build.

Additional Locations (1)
Fix in Cursor Fix in Web

Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>
Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 2 total unresolved issues (including 1 from previous review).

Fix All in Cursor

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.

# exact versions, so integrity is maintained through version pinning.
uv pip install --system --no-cache-dir --no-deps \
--index-strategy unsafe-best-match \
--no-verify-hashes \
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Disabling hash verification weakens supply chain security

Medium Severity

Adding --no-verify-hashes disables integrity checking for all packages installed from the lock file. The lock files still contain hashes, but they are completely ignored during installation. This means a compromised or tampered package on the CUDA index (or any alternate index used via unsafe-best-match) could be installed without detection. While version pinning provides some defense, hash verification is the primary protection against supply chain attacks where an index serves a modified binary for a pinned version. A more targeted fix — such as regenerating hashes from the actual CUDA index, or excluding only the mismatched packages — would preserve integrity checking for the majority of dependencies.

Fix in Cursor Fix in Web

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

go add ONLY when ready to merge, run all tests llm serve Ray Serve Related Issue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Ray fails to serialize self-reference objects

4 participants