Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update dependency vllm to v0.7.2 [SECURITY] #16

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

renovate[bot]
Copy link

@renovate renovate bot commented Jan 29, 2025

This PR contains the following updates:

Package Change Age Adoption Passing Confidence
vllm ==0.6.1 -> ==0.7.2 age adoption passing confidence

GitHub Vulnerability Alerts

CVE-2025-24357

Description

The vllm/model_executor/weight_utils.py implements hf_model_weights_iterator to load the model checkpoint, which is downloaded from huggingface. It use torch.load function and weights_only parameter is default value False. There is a security warning on https://pytorch.org/docs/stable/generated/torch.load.html, when torch.load load a malicious pickle data it will execute arbitrary code during unpickling.

Impact

This vulnerability can be exploited to execute arbitrary codes and OS commands in the victim machine who fetch the pretrained repo remotely.

Note that most models now use the safetensors format, which is not vulnerable to this issue.

References

CVE-2025-25183

Summary

Maliciously constructed prompts can lead to hash collisions, resulting in prefix cache reuse, which can interfere with subsequent responses and cause unintended behavior.

Details

vLLM's prefix caching makes use of Python's built-in hash() function. As of Python 3.12, the behavior of hash(None) has changed to be a predictable constant value. This makes it more feasible that someone could try exploit hash collisions.

Impact

The impact of a collision would be using cache that was generated using different content. Given knowledge of prompts in use and predictable hashing behavior, someone could intentionally populate the cache using a prompt known to collide with another prompt in use.

Solution

We address this problem by initializing hashes in vllm with a value that is no longer constant and predictable. It will be different each time vllm runs. This restores behavior we got in Python versions prior to 3.12.

Using a hashing algorithm that is less prone to collision (like sha256, for example) would be the best way to avoid the possibility of a collision. However, it would have an impact to both performance and memory footprint. Hash collisions may still occur, though they are no longer straight forward to predict.

To give an idea of the likelihood of a collision, for randomly generated hash values (assuming the hash generation built into Python is uniformly distributed), with a cache capacity of 50,000 messages and an average prompt length of 300, a collision will occur on average once every 1 trillion requests.

References


Release Notes

vllm-project/vllm (vllm)

v0.7.2

Compare Source

Highlights

  • Qwen2.5-VL is now supported in vLLM. Please note that it requires a source installation from Hugging Face transformers library at the moment (#​12604)
  • Add transformers backend support via --model-impl=transformers. This allows vLLM to be ran with arbitrary Hugging Face text models (#​11330, #​12785, #​12727).
  • Performance enhancement to DeepSeek models.
    • Align KV caches entries to start 256 byte boundaries, yielding 43% throughput enhancement (#​12676)
    • Apply torch.compile to fused_moe/grouped_topk, yielding 5% throughput enhancement (#​12637)
    • Enable MLA for DeepSeek VL2 (#​12729)
    • Enable DeepSeek model on ROCm (#​12662)
Core Engine
  • Use VLLM_LOGITS_PROCESSOR_THREADS to speed up structured decoding in high batch size scenarios (#​12368)
Security Update
  • Improve hash collision avoidance in prefix caching (#​12621)
  • Add SPDX-License-Identifier headers to python source files (#​12628)
Other
  • Enable FusedSDPA support for Intel Gaudi (HPU) (#​12359)

What's Changed

New Contributors

Full Changelog: vllm-project/vllm@v0.7.1...v0.7.2

v0.7.1

Compare Source

Highlights

This release features MLA optimization for Deepseek family of models. Compared to v0.7.0 released this Monday, we offer ~3x the generation throughput, ~10x the memory capacity for tokens, and horizontal context scalability with pipeline parallelism

V1

For the V1 architecture, we

Models
  • New Model: MiniCPM-o (text outputs only) (#​12069)
Hardwares
  • Neuron: NKI-based flash-attention kernel with paged KV cache (#​11277)
  • AMD: llama 3.2 support upstreaming (#​12421)
Others
  • Support override generation config in engine arguments (#​12409)
  • Support reasoning content in API for deepseek R1 (#​12473)
What's Changed
New Contributors

Full Changelog: vllm-project/vllm@v0.7.0...v0.7.1

v0.7.0

Compare Source

Highlights

  • vLLM's V1 engine is ready for testing! This is a rewritten engine designed for performance and architectural simplicity. You can turn it on by setting environment variable VLLM_USE_V1=1. See our blog for more details. (44 commits).
  • New methods (LLM.sleep, LLM.wake_up, LLM.collective_rpc, LLM.reset_prefix_cache) in vLLM for the post training frameworks! (#​12361, #​12084, #​12284).
  • torch.compile is now fully integrated in vLLM, and enabled by default in V1. You can turn it on via -O3 engine parameter. (#​11614, #​12243, #​12043, #​12191, #​11677, #​12182, #​12246).

This release features

  • 400 commits from 132 contributors, including 57 new contributors.
    • 28 CI and build enhancements, including testing for nightly torch (#​12270) and inclusion of genai-perf for benchmark (#​10704).
    • 58 documentation enhancements, including reorganized documentation structure (#​11645, #​11755, #​11766, #​11843, #​11896).
    • more than 161 bug fixes and miscellaneous enhancements
Features

Models

Hardwares

Features

  • Distributed:
  • API Server: Jina- and Cohere-compatible Rerank API (#​12376)
  • Kernels:
    • Flash Attention 3 Support (#​12093)
    • Punica prefill kernels fusion (#​11234)
    • For Deepseek V3: optimize moe_align_block_size for cuda graph and large num_experts (#​12222)
Others
  • Benchmark: new script for CPU offloading (#​11533)
  • Security: Set weights_only=True when using torch.load() (#​12366)

What's Changed


Configuration

📅 Schedule: Branch creation - "" (UTC), Automerge - At any time (no schedule defined).

🚦 Automerge: Enabled.

Rebasing: Whenever PR is behind base branch, or you tick the rebase/retry checkbox.

🔕 Ignore: Close this PR and you won't be reminded about this update again.


  • If you want to rebase/retry this PR, check this box

This PR was generated by Mend Renovate. View the repository job log.

@renovate renovate bot force-pushed the renovate/pypi-vllm-vulnerability branch from c5accc9 to 72fc532 Compare February 6, 2025 16:40
@renovate renovate bot force-pushed the renovate/pypi-vllm-vulnerability branch from 72fc532 to bf5236d Compare February 8, 2025 16:10
@renovate renovate bot changed the title Update dependency vllm to v0.7.0 [SECURITY] Update dependency vllm to v0.7.2 [SECURITY] Feb 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

0 participants