[Roadmap]: Dynamo roadmap until 0.4.0 (8/6)

**UPDATE**: On 8/12/25,  we launched Dynamo v0.4.0, which is our first stable release. In the H2, we plan to GA Dynamo and focus on 5 key areas: 

- Performance
- Fault tolerance
- K8 deployment
- Memory management and transfer
- Scheduling with router and planner

We will create a more detailed H2 roadmap and post it as a pinned issue by week of 8/18. As we make progress towards a GA product, we would like to prioritize features based on feedback and share our progress more accurately. Please be on the look out for the H2 roadmap, and thank you so much for your ongoing support.

-----
We're sharing our roadmap leading up to the v0.4.0 release to foster open-source development of Dynamo. 
 By then, our goal is to make Dynamo production-ready for GenAI inference.

This roadmap is subject to change, and community contributions are highly encouraged. If you're interested in contributing to specific features, please comment on this issue.

To accelerate progress, Dynamo releases will follow a biweekly cadence. Expect a major release monthly (incrementing by 0.1) with minor releases in between.

** UPDATE**: We are pushing back v0.4.0 by two weeks to prioritize performance, fault tolerance, observability, and metrics. We will be descoping application features such as agents and multi-LoRA. Descoped features are ~~crossed out~~ in the following objectives, and new added features are indicated in **bold** 

-----
**Key objectives from v0.2.0 until v0.4.0**

- [ ] LLM support
    - [x] Performant disaggregated serving including DeepSeek R1 -> WideEP performance is in progress with inference engines. SGLang & TRT-LLM wideEP scripts are available on Github. 
    - [ ] ~~Multi-LoRA~~  -> targeted for 0.5 release
    - [x] Speculative decoding
    - [x] Full feature support with TRT-LLM, vLLM, and SGLang 
- [ ] Multimodal
    - [x] Text - image - video model support
    - [x] E/P/D disaggregation 
    - [ ] ~~Multimodal cache~~ 
- [ ] KV cache manager -> targeting 0.4.1 to support HBM, host memory, and local disk
    - [x] KV offloading to multiple levels of memory hierarchy
    - [ ] Local & Network storage support with most known storage vendors
    - [ ] Performant multi-turn conversations
- [x ] Planner
    - [x] Dynamic allocation of prefill and decode
    - [x] SLA requirement based real time performance tuning
**- [ ] Fault tolerance
    - [x] Request cancellation and migration -> Request cancellation targeted for 0.4.1 release 
    - [ ] Model recovery with faster weights
    - [x ] Monitoring for HW and instance load/recovery time**
 - [ ] Observability and metrics 
    - [ ] OpenTelemetry -> supported through structured logging aligned with OTel
    - [x] Metrics for dcgm, frontend, and worker
- [ ] ~~Agents~~
    - [ ] ~~Constrained decoding and function calling~~
    - [ ] ~~Performant KV offloading and pre-fetching based on predicted time of agent execution~~
    - [ ] ~~MCP support~~
- [x] Performance benchmarking
    - [x] GPU level metrics
    - [x] Energy metrics for TCO calculation
- [ ] Validated K8 workflow for deployment
    - [x] Scale up to 64 GPUs
    - [x] Helm charts and custom operators 
    - [x] AWS/Azure/GKE support and tutorials -> GKE will be added to 0.4.1 release

    
--- 
**Expected timeline**    

Here are the major features you can expect in our next immediate release. We will provide more details for subsequent releases iteratively to ensure transparency. Please stay tuned for further updates.

* **v 0.2.0 (End of April)**
    - [ ] KV Manager
        - [x] Offloading enabled for GPU, host memory, SSD, and network storage
    - [ ] Planner 
        - [x]  Dynamically allocate prefill and decode
    - [x] Validated K8 workflow
        - [x] Helm charts and custom operators
    - [x] NIXL AWS EFA support and NIXL microbench

* **v 0.2.1 (Mid May)**
    - [x] vLLM v1 support and reduced performance overhead
    - [x] SGLang integration
    - [x] KV Manager
        - [x] Offloading enabled for GPU, host memory, SSD, and network storage
    - [x] Planner with K8 support
    - [x] Multimodal model support 
       - [x] Functional E/P/D disaggregation with text + image model (Llava 1.5) 
    - [x] NIXL Mooncake plugin integration

* **v 0.3.0 (Targeted for 6/4)**
    - [ ] Performant Deepseek R1 disaggregated serving with SGLang, TRT-LLM, and vLLM
       - [ ] SGLang focused on Hopper performance
       - [ ] TRT-LLM focused on Blackwell performance
    - [ ] Fault tolerance for Dynamo components
    - [ ] KV Manager integrated with Dynamo runtime and vLLM
    - [x] K8 support
       - [x] Model caching across pipelines
       - [x] Initial Gitops implementation for rolling updates and zero-downtime deployment
    - [ ] Multimodal model support
       - [x] Performant E/P/D disaggregation with text + image model
       - [ ] Functional E/P/D disaggregation with text + video model
    - [x] Planner 
       - [x] Provide guide + sweep script to allow user to pick up a good starting configuration based on SLA
    - [x] NIXL 
       - [x] Generic object storage support
       - [x] GPU initiated communication
       - [x] UCX Resiliency

* **v 0.3.1 (7/1)**
   - [x] Functional DeepSeek R1 disaggregated serving with wide EP using SGLang
   - [x] Functional EPD disaggregation with video model (Llava video 7B)
   - [x] Proof of concept inference gateway support
   - [x] Prebuilt Dynamo + vLLM container
   - [x] Amazon Linux support (x86)

* **v 0.3.2 (7/18)**
   - [x] ~~Performant~~ Functional R1 disaggregated serving with wide EP using SGLang
   - [x] Functional R1 disaggregated serving with wide EP using TRT-LLM
   - [x] Improved UX experience for simple and direct K8 deployment without using SDK with CLIs.   
   - [x] Native DCGM and Prometheus integration enables hardware metrics collection and export
   - [ ] E2E functional KV Block Manager with vLLM capable of offloading to HBM, system memory, and local disk - core code done still working on vLLM integration
   - [x] Pre-built containers for vLLM, SGLang, and TRT-LLM
   - [x] SLA (TTFT & ITL) based planner

* **v 0.4.0 (8/12)**
   - [x] Native vLLM v1 support
   - [x] Improved performance of R1 disaggregated serving with wideEP for SGLang and TRT-LLM 
   - [x] Enhanced helm chart flexibility and K8 examples for SGLang, TRT-LLM, vLLM 
   - [x] Request migration 
   - [x] Structured logging, hierarchical Prometheus metrics registry, and custom metric support
   - [x] Multi-modal example with Llama Maverick 




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Roadmap]: Dynamo roadmap until 0.4.0 (8/6) #762

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Roadmap]: Dynamo roadmap until 0.4.0 (8/6) #762

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions