Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Xeon Phi (Knights Corner) Support. #6440

Open
wants to merge 213 commits into
base: master
Choose a base branch
from

Conversation

julialongtin
Copy link

@julialongtin julialongtin commented Apr 2, 2024

Most of the gains come from an assembly implementation of Q5K . Q8K dot product code, written in IMCI assembly.

goes from 0.18 tokens per second on mistral 7B instruct (Q5K) to 1.2 tokens per second.

@github-actions github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Jun 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request ggml changes relating to the ggml tensor library for machine learning performance Speed related topics Review Complexity : High Generally require indepth knowledge of LLMs or GPUs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants