paged-attention

Here are 2 public repositories matching this topic...

xlite-dev / Awesome-LLM-Inference

📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉

mla vllm llm-inference awesome-llm flash-attention tensorrt-llm paged-attention deepseek flash-attention-3 deepseek-v3 minimax-01 deepseek-r1 flash-mla qwen3

Updated Jan 18, 2026
Python

VARUN3WARE / Paged-Attention

Star

Implementation of PagedAttention from vLLM paper - a breakthrough attention algorithm that treats KV cache like virtual memory. Eliminates memory fragmentation, increases batch sizes, and dramatically improves LLM serving throughput.

memory-optimization kv-cache llm-inference paged-attention transformer-optimization

Updated Dec 3, 2025
Python

Improve this page

Add a description, image, and links to the paged-attention topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the paged-attention topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly