Official implementation of "SentenceKV: Efficient LLM Inference via Sentence-Level Semantic KV Caching" (COLM 2025). A novel KV cache compression method that organizes cache at sentence level using semantic similarity.
natural-language-processing transformers memory-efficiency efficient-inference inference-optimization kv-cache llm semantic-caching colm2025
-
Updated
Sep 29, 2025 - Python