Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mirror "Arm AArch64: optimized GEMV and GEMM kernels for q4_0_q8_0, and q8_0_q8_0 quantization" from llama.cpp #2605

Open
smpurkis opened this issue Nov 8, 2024 · 0 comments

Comments

@smpurkis
Copy link

smpurkis commented Nov 8, 2024

I'm looking to mirror this change ggerganov/llama.cpp#5780 for Candle.

I have some experience with rust and have a repo that uses the same assembly instructions (see here) as the above PR but need some help/guidance integrating it with Candle.

  1. The license seems to be separate for the GEMV and GEMM specific code in llama.cpp, not sure what the best option here is. In llama.cpp I they kept it in a separate file with its own licence, see here.
  2. The GgmlType trait uses vec_dot and loops over to calculate the matmul. The assembly for these interweaved types directly runs the matmul in assembly (in llama.cpp), so there is no associated vec_dot function to place into the GgmlType trait. Should I modify this trait or create another for these interweave types? What is the best way to handle this?
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant