-
Notifications
You must be signed in to change notification settings - Fork 9.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ggml-alloc : fix backend assignments of views #3982
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Appears to fix the issue.
I applied this patch, but save-load-state still fails like in #2422 or #3820. Edit: I also checkout out this branch instead of just applying a patch to mine, so with
|
Sorry, it seems there was a misunderstanding. What this fixes is offloading of the V cache but not the K cache with CUDA. The issue with load/save state not working when the KV cache is offloaded is not solved by this. |
Fixes offloading of the V cache only