-
Notifications
You must be signed in to change notification settings - Fork 12.1k
llama : use n_swa + n_ubatch cells for SWA cache #13833
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
1bce7e8
to
6468631
Compare
I'll try testing. |
6468631
to
ef5bb61
Compare
tools/server/server.cpp
Outdated
const auto pos_min = llama_kv_self_seq_pos_min(ctx, slot.id); | ||
if (pos_min > 0) { | ||
SLT_WRN(slot, "n_past = %d, cache_tokens.size() = %d, seq_id = %d, pos_min = %d\n", slot.n_past, (int) slot.cache_tokens.size(), slot.id, pos_min); | ||
if (pos_min == -1 || pos_min > slot.n_past - n_swa) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pos_min == -1
meaning the seq is empty. In this case, I think the behavior of setting n_past = 0
is expected, so we don't necessary need to log the warning
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the sequence is not present in the KV cache (i.e. pos_min == -1
), but we somehow decided that slot.n_past > 0
(see the condition above) then this is still unexpected. I think we might even want to abort in such cases, because it means there is a bug somewhere.
ef5bb61
to
4a9253a
Compare
8342295
to
855b397
Compare
target #13845
Overview