-
Notifications
You must be signed in to change notification settings - Fork 11.9k
kv-cache : rework kv_cell #13706
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
kv-cache : rework kv_cell #13706
Conversation
ggml-ci
In the next PR I will try to rework these 3 methods with something like llama.cpp/src/llama-kv-cache.h Lines 45 to 56 in 9023ae3
The main goal is to be able to run SWA caches with just When this rework is ready, I will use the new llama.cpp/src/llama-kv-cache.h Lines 37 to 41 in 9023ae3
Simulating a full cache will be now achieved by initializing the appropriate batches and just not processing them. Any suggestions about the plan are welcome. |
0a8cdc3
to
eda2e13
Compare
src/llama-kv-cells.h
Outdated
} | ||
|
||
// note: call only if the cell is not empty | ||
llama_pos get_pos(uint32_t i) const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be pos_get
for consistency with pos_set
etc?
cont #13194
The KV cells editing logic is now implemented via the new
struct llama_kv_cells_unified
in the newsrc/llama-kv-cells.h
source. The goal is to simplify the implementation inllama-kv-cache.cpp
and make it easier to understand and update in the future.One of the primary simplifications is that
llama_kv_cache_unified
no longer tracks the number ofused
cells manually. This is now automatically tracked by thellama_kv_cells_unified
based on the edits that we apply, such as adding and removing sequences from the cells. Same for thehas_shift
flag.pos
,delta
,seq
) is now a structure of arrays for better cache localitystd::bitset
instead ofstd::set
Here is an example of the position shift logic before and after the change:
Next
n = cell_max()
) instead of searching for it on every batch