kv-cache : prepare K/V buffers for separation #14517

ggerganov · 2025-07-03T12:55:17Z

Currently, the K and V buffers in the unified KV cache are shared among all the participating sequences (hence the name "unified"). With the upcoming change #14363, the buffers can become separate from each other in order to increase the throughput for parallel decoding use cases. This PR is a preparation step to support that.

There should be no functional changes.

Handling of variable V heads is also done when ggml_set_rows() is used.

LLAMA_SET_ROWS=1 ./bin/llama-cli -hf mradermacher/OpenELM-3B-Instruct-GGUF:Q8_0 \
  -p "I believe the meaning of life is" -no-cnv -n 32 -t 1 -s 2 --top-k 1

Outdated

The only new restriction is that we require the number of KV heads for all layers to be equal:

https://github.com/ggml-org/llama.cpp/blob/40f8c4830a0a927adf448c3ded96129b9823c90f/src/llama-kv-cache-unified.cpp#L70-L77

Support for varying number of KV heads should be simple - just need to make the correct view of v_idxs when FA is disabled. But leaving this for when we actually need it.

compilade · 2025-07-03T15:52:16Z

src/llama-kv-cache-unified.cpp

I think OpenELM is a model family which needs this, see #7359

This is now fixed with the latest commit.

ggml-ci

ExtReMLapin · 2025-07-04T14:55:42Z

ref, Sounds related to #10860

ggerganov · 2025-07-04T16:23:01Z

#14363 is more relevant. This PR is a standalone preparation step that I extracted to make the final PR easier to review.

ggerganov · 2025-07-09T09:55:04Z

Will merge directly #14363 when ready

ggerganov force-pushed the gg/kv-cache-prepare-separation branch from 2a738fe to 40f8c48 Compare July 3, 2025 12:57

compilade reviewed Jul 3, 2025

View reviewed changes

kv-cache : prepare K/V buffers for separation

886da0a

ggml-ci

ggerganov force-pushed the gg/kv-cache-prepare-separation branch from 386425f to 886da0a Compare July 4, 2025 07:13

ggerganov closed this Jul 9, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

kv-cache : prepare K/V buffers for separation #14517

kv-cache : prepare K/V buffers for separation #14517

Uh oh!

ggerganov commented Jul 3, 2025 •

edited

Loading

Uh oh!

compilade Jul 3, 2025

Uh oh!

ggerganov Jul 3, 2025

Uh oh!

ExtReMLapin commented Jul 4, 2025

Uh oh!

ggerganov commented Jul 4, 2025

Uh oh!

ggerganov commented Jul 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

kv-cache : prepare K/V buffers for separation #14517

kv-cache : prepare K/V buffers for separation #14517

Uh oh!

Conversation

ggerganov commented Jul 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

compilade Jul 3, 2025

Choose a reason for hiding this comment

Uh oh!

ggerganov Jul 3, 2025

Choose a reason for hiding this comment

Uh oh!

ExtReMLapin commented Jul 4, 2025

Uh oh!

ggerganov commented Jul 4, 2025

Uh oh!

ggerganov commented Jul 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ggerganov commented Jul 3, 2025 •

edited

Loading