How can I swap two KV cache slots? #3507
Unanswered
KerfuffleV2
asked this question in
Q&A
Replies: 1 comment 4 replies
-
Before implementing the defragmentation, we should understand better #3479. KV cache defragmentation would improve things, but not by a large amount. The performance concerns in #3479 might require reworking the KV cache all together if it turns out that the current way of storing it does not allow to scale the batch size (see #3479 (comment)) |
Beta Was this translation helpful? Give feedback.
4 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Let's make this simple and say I want to swap the slot at position 10 with the one at position 25, we're running on CPU only and I only care about LLaMA. So this would happen in
llm_build_llama
around the same place the KV cache shifting stuff happens.Now something has to happen with
kv_self.k
andkv_self.v
, but what?Once I know this, I'll probably be able to implement KV cache defragmenting which can make a big difference for parallel generation.
Beta Was this translation helpful? Give feedback.
All reactions