Description
vLLM V1 has been the default engine since version v0.8.0, released approximately three months ago. With substantial user adoption and overwhelmingly positive feedback on V1, we propose formally deprecating vLLM V0 and removing its implementation from the vLLM codebase.
TL;DR
- Effective immediately, the vLLM V0 codebase is frozen, with only minor bug fixes permitted.
- Deprecation of V0 will occur at the end of June, followed by the removal of its code.
- By that time, migration of the remaining features from V0 to V1 will be completed. Certain features may be temporarily or permanently discontinued (details provided below).
Motivation
1. Reduce Code Complexity and Technical Debt
Currently, V0 and V1 share significant portions of code—such as models, configs, and utilities—which has introduced considerable complexity and technical debt. Contributors unfamiliar with V0 face difficulties assessing the impact of changes, especially when shared components inadvertently break V0 functionality. Eliminating V0 will simplify the codebase, enhance maintainability, and accelerate the development and improvement of V1.
2. Avoid User and Contributor Confusion
The coexistence of V0 and V1 frequently leads to confusion, especially among new users and contributors who inadvertently reference V0 code (e.g., core/scheduler.py
) instead of corresponding V1 modules (e.g., v1/core/sched/scheduler.py
). This confusion also extends to AI assistants. Removing V0 entirely will eliminate this ambiguity and streamline onboarding.
3. Simplify CI and Reduce Cost
Currently, a substantial portion of our CI resources are devoted to testing V0 code paths. Removing V0 is projected to reduce CI times and associated costs by at least half, reallocating these resources toward accelerating V1 development and improving test coverage.
Given the robust support and ongoing maintenance of V1, continued investment in V0 provides diminishing returns. Deprecating V0 is thus both strategic and necessary.
Transition Plan
We propose the following phased plan for deprecating V0:
Immediate Action (v0.9.0)
- Freeze feature development on V0. Only minor bug fixes will be accepted until deprecation.
By June 30, 2025
Complete migration of critical features from V0 to V1:
-
Model support:
- Embedding models
- Mamba-style models
-
Feature support:
- Logits processors (via new APIs)
- OpenTelemetry APIs
-
Hardware backends:
- Intel CPU and XPU
Features Temporarily Discontinued
- Encoder-decoder models (e.g., Whisper)
- Draft model-based speculative decoding
- Neuron backend (planned reintroduction in V1)
- HPU backend (to be reintroduced as a plugin)
Features Permanently Dropped
- Prompt adapter
- V100 support
July 1, 2025 (v0.10.0)
- Officially announce the deprecation of V0.
- Begin the removal of V0 code from the repository.
Early August 2025 (v0.11.0)
- Fully remove all V0-related code and CI coverage.
- Update and clarify documentation accordingly.
Versioning Scheme Explained
- v0.9: Last supported version of the V0 engine, with frozen features and continued testing.
- v0.10: Version marking the start of V0 code removal.
- v0.11: First version without any V0 components.
Feedback and Questions
We encourage and appreciate your feedback, concerns, or questions. Our goal is to ensure transparency and community alignment throughout this transition.