Replies: 2 comments 1 reply
-
|
[edit: 2024/12/09]: OK some correction and more tuning on ryzen 5950x (zen3) I get (from llama.cpp code):
Not the best that we can have with this CPU but we may need a true BLIS kernel for best (I think we can have ~80 t/s) on AMD Ryzen™ 9 7940HS (zen4)
|
Beta Was this translation helpful? Give feedback.
0 replies
-
|
@jart do you want I try it on llamafile ? |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
ikawrakow/ik_llama.cpp#71 have a good idea.
I'll figure to add it in tinyblas and id work great. (and I add quant in FP16/BF16 in all case for B to reduce memory bandwidth. work nice for AVX512/AVX2 kernel)
https://github.com/Djip007/llama.cpp/blob/perfo/tinyblas/ggml/src/ggml-cpu/llamafile/sgemm.cpp#L297
Beta Was this translation helpful? Give feedback.
All reactions