Open
Description
This issue document the roadmap for 2025 Q2. We will keep updating this document to include the related issues, pull requests, and discussions in the #production-stack channel in the vLLM slack. Feel free to leave comments / questions in this issue!
Router
Router “frontend”
- (P1) Gateway API extension integration (PR [Feat] Simple Gateway inference extension integration #436 )
- (P2) Router performance enhancements
- Nuitka compilation for the current router
- Rust/Go/Nginx-based frontend for router
- (P2) Integration with Application interfaces like MCP or other agentic workflows ([Feat] Tool calling support for MCP client integration #352 )
- (P2) Envoy ext_proc integration (Draft PR: [FEAT] Integrate the router with Envoy through extproc #240)
Router Core Logic
- (P0) Queuing support in the router
- (P0) Prefix aware routing ([Feat] Prefix-aware routing and load balancing #239 )
- (P0) KV cache aware routing ([Feat] KV cache aware routing #398 )
- (P2) Priority routing
- (P2) Routing to external providers like OpenAI or Anthropic
- (P2) Request migration when the vLLM instance fails
- (P2) Router extension modules
- Semantic caching ([FEAT] enable experimental semantic cache in router #210 )
- PII detection (feat: support PII detection in http request #235 )
Integration with other ecosystem projects
- (P0) Compatibility with vLLM v1
- (P0) Integration with LMCache KV cache controller
Multi-node support
- (P0) Single-vLLM instance multi-node setup (e.g., PP on 2 different nodes) (Feat/basic pipeline parallelism #422 )
- (P1) The reference implementation for disaggregated prefill ([Feat] Support basic disaggregated prefill (vLLM v0) #340)
K8s-native control plane
- (P0) Model CRD for vLLM deployment operations ([Feat] Add initial CRD support for production stack #415 )
- (P1) Router CRD for router deployment operations ([Feat] Add initial CRD support for production stack #415 )
- (P1) LoRA CRD & controller (Proposal: lora-k8s-support.md )
- (P1) Autoscale CRD & controller (Proposal: [Doc] Add CRD autoscaler proposal #238 )
CI/CD and misc.
- (P1) Comprehensive unit tests for router
- (P1) Use CPU-based vLLM for functionality tests ([CI] Run functionality tests with CPU Docker #342 )
- (P1) Github actions for router performance benchmarking
- (P2) Github actions for building router docker images for different architectures
- (P2) Release bot to automatically release new versions (helm chart + k8s controller packages + docker images) ([CI/Build] Add release bot workflow #450 )
- (P1) Documentation ([CI/Build] Github action for building docs pipeline #291 )
- (P2) Tutorials for more cloud platforms, different models, and features
- (P2) End-to-end performance benchmarking: (more workload, more setups)
Metadata
Metadata
Assignees
Labels
No labels