-
Notifications
You must be signed in to change notification settings - Fork 10.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parallelize dequantization or fp format conversion when using blas #4988
Comments
Thanks for looking into this! The global atomic flag is indeed ugly, so we need to figure out something better. The limitation that I think |
The corresponding enhancedments with multi-threaded initialization implemented here branch *Master
build: 57e2a7a (1917) *Modified
build: 8643989 (1919) Besides, here may have something to do since higher amount of threads is useful now. |
Num of threads is required be carefully configured |
@ReinForce-II I think you are on the right track. Would you like to open a Pull Request so we can review the changes in detail and suggest improvements if necessary? @slaren I wonder if we can now utilize |
Yes, this is something that I would like to do. It should only require minor changes to |
In ggml_compute_forward_mul_mat@ggml.c, gemm parallelism is managed by blas library. However, this disables multithreading on dequantizing weights, which may be a bottleneck.
I have performed some ugly modifications for comparing performance.
cb828c8
The text was updated successfully, but these errors were encountered: