-
Notifications
You must be signed in to change notification settings - Fork 963
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for new KV Cache Offloading API #995
Comments
I force on the ctx param to offload kv_cache. The logs print that it's using F16 cache. 81 layers offloaded. memory sizes for K and V. assume KV "should" be. Speeds on 70b are now half and prompt processing is less than 1/4 with no context. Womp Womp. |
Ok, I got it working when I moved the kv to the proper place in the struct.
|
brandonrobertz
added a commit
to brandonrobertz/llama-cpp-python
that referenced
this issue
Dec 17, 2023
This addresses two issues: - abetlen#995 which just requests to add the KV cache offloading param - abetlen#1006 a NULL ptr exception when using the embeddings (introduced by leaving f16_kv in the fields struct)
brandonrobertz
added a commit
to brandonrobertz/llama-cpp-python
that referenced
this issue
Dec 17, 2023
This addresses two issues: - abetlen#995 which just requests to add the KV cache offloading param - abetlen#1006 a NULL ptr exception when using the embeddings (introduced by leaving f16_kv in the fields struct)
brandonrobertz
added a commit
to brandonrobertz/llama-cpp-python
that referenced
this issue
Dec 17, 2023
F16_KV appears to have been removed here: ggerganov/llama.cpp@af99c6f This addresses two issues: - abetlen#995 which just requests to add the KV cache offloading param - abetlen#1006 a NULL ptr exception when using the embeddings (introduced by leaving f16_kv in the fields struct)
abetlen
pushed a commit
that referenced
this issue
Dec 18, 2023
F16_KV appears to have been removed here: ggerganov/llama.cpp@af99c6f This addresses two issues: - #995 which just requests to add the KV cache offloading param - #1006 a NULL ptr exception when using the embeddings (introduced by leaving f16_kv in the fields struct)
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Source ggerganov/llama.cpp#4309
The text was updated successfully, but these errors were encountered: