Skip to content

[Roadmap] vLLM Roadmap Q1 2025 #11862

Closed
Closed
@simon-mo

Description

@simon-mo

This page is accessible via roadmap.vllm.ai

This is a living document! For each item here, we intend to link the RFC as well as discussion Slack channel in the vLLM Slack

vLLM Core

These projects will deliver performance enhancements to majority of workloads running on vLLM, and the core team has assigned priorities to signal what must get done. Help is also wanted here, especially for people want to get more involved in the core of vLLM.

Ship a performant and modular V1 architecture (#8779, #sig-v1)

Support large and long context models

  • (P0) Expert Parallelism for MoE
  • (P1) Productionize Prefill Disaggregation
  • (P1) Productionize KV Cache offloading to CPU and disk
  • (P1) Explore Data Parallel for Attention
  • (Help Wanted) Investigate context parallelism

Improved performance in batch mode

  • (P0) Optimized vLLM in post training workflow (#sig-post-training)
  • (P2) Efficiency in batch inference and long generations

Others

  • (P0) Blackwell Support
  • (P1) Track vLLM Performance
  • (Help Wanted) Extensible sampler

Model Support

Hardware Support

  • PagedAttention and Chunked Prefill on Trainium and Inferentia
  • Productionize and support large scale deployment of vLLM on TPU
  • Progress in Gaudi Support
  • Out of tree support for IBM Spyre and Ascend ([RFC]: Hardware pluggable #11162)

Optimizations

CI and Developer Productivity

  • Wheel server
  • Multi-platform wheels and docker
  • Better performance tracker
  • Easier installation (optional dependencies, separate kernel packages)

Ecosystem Projects

These are independent projects that we love to have native collaboration and integration with!


If any of the items you wanted is not on the roadmap, your suggestion and contribution is strongly welcomed! Please feel free to comment in this thread, open feature request, or create an RFC.

Historical Roadmap: #9006, #5805, #3861, #2681, #244

Metadata

Metadata

Assignees

No one assigned

    Labels

    staleOver 90 days of inactivity

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions