Recently, we see several awesome work focusing on kv cache compressing and they said can accelearte 1.7~2.3 times than FlashInfer, can you guys plz consider to surpport such features?
Same layer KV:
Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference

MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression
Cross layer KV:
Reducing Transformer Key-Value Cache Size with Cross-Layer Attention