[Roadmap] UCM Roadmap Q4 2025

UCM aims to accelerate reasoning for long sequences, encompassing table lookup instead of KV computation in the Prefill phase, sparsification in the Decode phase, and a PD (Prefill-Decode) disaggregated architecture centered on KVCache for large-scale scenarios.

The first version of UCM has achieved the basic goal of sparsification acceleration for long sequences and successfully implemented a heterogeneous PD Disaggregation example. In Q4, we will successively release long-sequence inference acceleration features to further enhance inference performance, reduce inference costs, and address issues such as long sequences being "unable to be inferred" or "slow to be inferred".

### Core
- [x] CacheBlend
- [ ] Prefill KVCache Offload
- [x] Model Window Extrapolation
- [x] Sparse
  - [ ]  Spare Attention Framework Optimization
  - [x] GSA Optimization
  - [x] KVComp Optimization
  - [x] KVStar Optimization
- [x] PD Disaggregation
  - [x] Heterogeneous Optimization
  - [x] PD Scheduler
- [x] Store
  - [x] UCM Store V1 framework
  - [x] CacheStore
  - [x] PosixStore
  - [x] PipelineStore
  - [x] Scatter Gather IO
  - [x] GPU Direct Storage
  - [x] NPU Direct Storage

### Others
- [ ] Docs Optimization
- [ ] Tools
  - [x] Observability：Metrics monitoring via Prometheus
  - [x] Tools for KVStore: bandwidth measurement tool
- [ ] Benchmark & Test
  - [x] Mooncake Trace and more dataset for PD test
  - [x] benchmark for sparse performance and accuracy
  - [ ] Support performance benchmarks — LLMPerf


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Roadmap] UCM Roadmap Q4 2025 #78

Core

Others

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Roadmap] UCM Roadmap Q4 2025 #78

Description

Core

Others

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions