How to use KV cache? #10223
Unanswered
manimathma
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi Team,
I couldn't find any documentation of how to use KV cache. Any pointers would help.
I assume --prompt-cache /tmp/llama3cache --prompt-cache-all would work but it didn't.
Iteration 1:
Iteration 2:
Also is there a way to kv cache just the query and not the responds. One of my idea is to cache the static information (system) and then use it for all llm interactions.
Beta Was this translation helpful? Give feedback.
All reactions