Skip to content

[Roadmap] UCM Roadmap Q4 2025 #78

@ygwpz

Description

@ygwpz

UCM aims to accelerate reasoning for long sequences, encompassing table lookup instead of KV computation in the Prefill phase, sparsification in the Decode phase, and a PD (Prefill-Decode) disaggregated architecture centered on KVCache for large-scale scenarios.

The first version of UCM has achieved the basic goal of sparsification acceleration for long sequences and successfully implemented a heterogeneous PD Disaggregation example. In Q4, we will successively release long-sequence inference acceleration features to further enhance inference performance, reduce inference costs, and address issues such as long sequences being "unable to be inferred" or "slow to be inferred".

Core

  • CacheBlend
  • Prefill KVCache Offload
  • Model Window Extrapolation
  • Sparse
    • Spare Attention Framework Optimization
    • GSA Optimization
    • KVComp Optimization
    • KVStar Optimization
  • PD Disaggregation
    • Heterogeneous Optimization
    • PD Scheduler
  • Store
    • UCM Store V1 framework
    • CacheStore
    • PosixStore
    • PipelineStore
    • Scatter Gather IO
    • GPU Direct Storage
    • NPU Direct Storage

Others

  • Docs Optimization
  • Tools
    • Observability:Metrics monitoring via Prometheus
    • Tools for KVStore: bandwidth measurement tool
  • Benchmark & Test
    • Mooncake Trace and more dataset for PD test
    • benchmark for sparse performance and accuracy
    • Support performance benchmarks — LLMPerf

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions