server: fix prompt caching for repeated prompts #5420

ristew · 2024-02-08T19:01:27Z

Fix non-deterministic prompt caching for repeated prompts (#4902).

All this PR does is move the conditional that checks if there are no new prompt tokens from after to before the kv cache is trimmed. This fixes the issue in my testing (repeating curl localhost:3731/completion -s -X POST -H 'Content-type: application/json' --data '{"prompt":"What a beautiful","n_probs":1,"n_predict":1,"cache_prompt":true}').

ggerganov

Ah good catch

server: fix prompt caching for same prompts (ggml-org#4902)

9535a7a

ggerganov approved these changes Feb 9, 2024

View reviewed changes

ggerganov merged commit 7c777fc into ggml-org:master Feb 9, 2024

Green-Sky mentioned this pull request Feb 10, 2024

server : fix context shift #5195

Merged

jordankanter pushed a commit to jordankanter/llama.cpp that referenced this pull request Mar 13, 2024

server : fix prompt caching for repeated prompts (ggml-org#5420)

bb38a79

hodlen pushed a commit to hodlen/llama.cpp that referenced this pull request Apr 1, 2024

server : fix prompt caching for repeated prompts (ggml-org#5420)

00c4762

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

server: fix prompt caching for repeated prompts #5420

server: fix prompt caching for repeated prompts #5420

Uh oh!

ristew commented Feb 8, 2024

Uh oh!

ggerganov left a comment

Uh oh!

Uh oh!

server: fix prompt caching for repeated prompts #5420

server: fix prompt caching for repeated prompts #5420

Uh oh!

Conversation

ristew commented Feb 8, 2024

Uh oh!

ggerganov left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!