kv-cache : refactor the update/defrag mechanism #13988
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
cont #13746 (comment)
Overview
virtual llama_kv_cache::update()
virtual llama_kv_cache::defrag_sched()
virtual llama_kv_cache::init_update()
llama_kv_cache_unified::defrag_prepare()
is nowconst
The logic for shifting and defragmenting the KV cache is now implemented using a memory state (i.e.
llama_memory_state
) for consistency with the decoding states that were introduced in #13746. The idea is that callinginit_update()
will check if any updates have to be performed without mutating the KV cache (a.k.a. the memory). We can then apply the created memory update state to perform the necessary updates:llama.cpp/src/llama-context.cpp
Lines 451 to 461 in 503dda2
This change generalizes the concept of updating the memory module. So far we have been doing KV cache shifts and defrags, but in the future we can do additional operations through this mechanism.
We also start to avoid the explicit "defrag" term as it is too specific for the unified KV cache. Instead, the
init_update()
method takes abool optimize
flag that can mean different things depending on the underlying memory implementation.Next PRs
llama_kv_cache::init_*
interface tollama_memory_i
(see kv-cache : refactor + add llama_memory_state_i #13746 (review)) PR: memory : migrate from llama_kv_cache to more generic llama_memory #14006