-
Notifications
You must be signed in to change notification settings - Fork 1k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Speed up prediction on CPUs with many cores
This change adds an if statement to the GGML synchronization code that causes significantly fewer memory barriers to be used. The syncthreads function has also been introduced so that GGML_OP_MUL_MAT can add it's barrier for initialization on its own. That's important, since if tiny BLAS doesn't need matrix B quantized, then the barrier can be skipped. This change clamps the thread count to 20 maximum after the prefill is completed. Charting thread count for numerous models on a Threadripper reveals that twenty threads is consistently the optimal for prediction Compared to the blog post https://justine.lol/matmul/#professional the token generation speed for TinyLLaMA 1.1B has increased, from 52 to 98 tokens per second. Prompt token per second is up to 2000. With Mistral 7b the gains are more modest, going from 17 to 21 tok / sec
- Loading branch information
Showing
8 changed files
with
295 additions
and
259 deletions.
There are no files selected for viewing
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters