List view
1. significant e2e speedup: more efficient kernels, more large trees; 2. compare with concurrent works; 3. implement real SD and MR;
Overdue by 1 year(s)•Due by August 2, 2024•0/1 issues closed1. kv manager selection and implementation: ours, radix tree, hashed seq group(vllm); 2. paged/unpaged selection; 3. profile to make sure the bottleneck is attention.
Overdue by 1 year(s)•Due by July 10, 2024•0/1 issues closed1. implement and profile cuda kernel; 2. benchmark with simplest mem management for different attention kernel baselines.
Overdue by 1 year(s)•Due by July 10, 2024•2/6 issues closed