Skip to content

Clarify KV cache per-layer calculation in continuous batching#3417

Open
bandham-manikanta wants to merge 1 commit into
huggingface:mainfrom
bandham-manikanta:fix-kv-calculation
Open

Clarify KV cache per-layer calculation in continuous batching#3417
bandham-manikanta wants to merge 1 commit into
huggingface:mainfrom
bandham-manikanta:fix-kv-calculation

Conversation

@bandham-manikanta

Copy link
Copy Markdown

This pull request clarifies the KV cache per-layer calculation in the continuous batching blog post. Since L (layers) and H (heads) are both 32 for Llama-2-7B, it's easy to mistake the 32 in the multiplication for L instead of H. Explicitly using H * A helps make this clearer. Also added the calculation of total cache size across all layers for completeness.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant