Clarify KV cache per-layer calculation in continuous batching by bandham-manikanta · Pull Request #3417 · huggingface/blog

bandham-manikanta · 2026-06-11T17:59:25Z

This pull request clarifies the KV cache per-layer calculation in the continuous batching blog post. Since L (layers) and H (heads) are both 32 for Llama-2-7B, it's easy to mistake the 32 in the multiplication for L instead of H. Explicitly using H * A helps make this clearer. Also added the calculation of total cache size across all layers for completeness.

Clarify KV cache per-layer calculation in continuous batching

dc2820a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Clarify KV cache per-layer calculation in continuous batching#3417

Clarify KV cache per-layer calculation in continuous batching#3417
bandham-manikanta wants to merge 1 commit into
huggingface:mainfrom
bandham-manikanta:fix-kv-calculation

bandham-manikanta commented Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

bandham-manikanta commented Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant