-
Notifications
You must be signed in to change notification settings - Fork 9.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
llama : per-layer KV cache #4309
Merged
Merged
Commits on Oct 3, 2023
-
Configuration menu - View commit details
-
Copy full SHA for e9bcf66 - Browse repository at this point
Copy the full SHA e9bcf66View commit details -
Configuration menu - View commit details
-
Copy full SHA for 55f2f2f - Browse repository at this point
Copy the full SHA 55f2f2fView commit details
Commits on Oct 6, 2023
-
Configuration menu - View commit details
-
Copy full SHA for f4f9367 - Browse repository at this point
Copy the full SHA f4f9367View commit details
Commits on Dec 3, 2023
-
Configuration menu - View commit details
-
Copy full SHA for c294c78 - Browse repository at this point
Copy the full SHA c294c78View commit details -
Configuration menu - View commit details
-
Copy full SHA for 986b3da - Browse repository at this point
Copy the full SHA 986b3daView commit details -
Configuration menu - View commit details
-
Copy full SHA for f3dbfb9 - Browse repository at this point
Copy the full SHA f3dbfb9View commit details -
Configuration menu - View commit details
-
Copy full SHA for 3d3e6bd - Browse repository at this point
Copy the full SHA 3d3e6bdView commit details -
Configuration menu - View commit details
-
Copy full SHA for 1fa91a4 - Browse repository at this point
Copy the full SHA 1fa91a4View commit details -
Configuration menu - View commit details
-
Copy full SHA for c44bc1e - Browse repository at this point
Copy the full SHA c44bc1eView commit details -
Configuration menu - View commit details
-
Copy full SHA for c80b8a2 - Browse repository at this point
Copy the full SHA c80b8a2View commit details -
Configuration menu - View commit details
-
Copy full SHA for e262947 - Browse repository at this point
Copy the full SHA e262947View commit details -
Configuration menu - View commit details
-
Copy full SHA for 66aaac9 - Browse repository at this point
Copy the full SHA 66aaac9View commit details
Commits on Dec 6, 2023
-
llama : support quantum K cache (#4312)
* llama : support quantum K cache (wip) * metal : add F32 -> Q8_0 copy kernel * cuda : add F32 -> Q8_0 copy kernel ggml-ci * cuda : use mmv kernel for quantum cache ops * llama : pass KV cache type through API * llama : fix build ggml-ci * metal : add F32 -> Q4_0 copy kernel * metal : add F32 -> Q4_1 copy kernel * cuda : wip * cuda : add F32 -> Q4_0 and F32 -> Q4_1 copy kernels * llama-bench : support type_k/type_v * metal : use mm kernel only for quantum KV cache * cuda : add comment * llama : remove memory_f16 and kv_f16 flags --------- Co-authored-by: slaren <slarengh@gmail.com>
Configuration menu - View commit details
-
Copy full SHA for 1a1a1c3 - Browse repository at this point
Copy the full SHA 1a1a1c3View commit details
Commits on Dec 7, 2023
-
Configuration menu - View commit details
-
Copy full SHA for 680a99e - Browse repository at this point
Copy the full SHA 680a99eView commit details -
Configuration menu - View commit details
-
Copy full SHA for fc5f334 - Browse repository at this point
Copy the full SHA fc5f334View commit details
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.