Skip to content

Milestones

List view

  • 1. significant e2e speedup: more efficient kernels, more large trees; 2. compare with concurrent works; 3. implement real SD and MR;

    Overdue by 1 year(s)
    Due by August 2, 2024
    0/1 issues closed
  • 1. kv manager selection and implementation: ours, radix tree, hashed seq group(vllm); 2. paged/unpaged selection; 3. profile to make sure the bottleneck is attention.

    Overdue by 1 year(s)
    Due by July 10, 2024
    0/1 issues closed
  • 1. implement and profile cuda kernel; 2. benchmark with simplest mem management for different attention kernel baselines.

    Overdue by 1 year(s)
    Due by July 10, 2024
    2/6 issues closed