Fix prompt cache saving and chat-persistent rollover #1678
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes #1670, by reworking the original fix for #1585 from #1609.
The original fix examined
embd
to determine if the prompt had been evaluated, butembd
is limited to the batch size. In addition, that fix leftsession_tokens
in its original state (i.e., the longer, cached prompt), while normal session evaluation truncates it at the first eval. This combination meant that any prompts with a cache hit on just the first batch (512 by default) would begin eval-ing ~from the second batch, and all of that eval would get appended to the end of the full, original cached prompt. This had the downstream effect of diverging the cache from the prompt and overrunning the context size in the cache, as seen in #1670.For the fix, I opted to move the re-eval logic to main's initialization rather than at the eval stage. Here, it transforms
session_tokens
such that it will only match (prompt - 1) tokens.Testing:
--prompt-cache
is longer than the new one #1585, applied the Z/joke test and got a joke that did not start with "Z"