-
Notifications
You must be signed in to change notification settings - Fork 300
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OpenCL seems to almost work #48
Comments
Try this patch: ggerganov/llama.cpp@6460f75 |
@ggerganov That worked, thank you! Is it proper protocol to submit a pull request for a one-liner? Edit: FYI: It allows entire process to complete, but does not actually make use of GPU. |
FYI: It does work, but GPU utilization is very low. Got any more simple speedups in your pocket? @ggerganov |
I'm sorry to disappoint you but openblas doesn't use the gpu to accelerate the processing but it uses the cpu itself. If anything you should try DGGML-CLBLAST=ON in order to use OpenCL but it still wouldn't work as the developer still hasn't integrated any gpu acceleration into the program. |
@daniandtheweb Thanks for pointing that out…it was a typo and the CLBLAST flag is what I was referring to. How difficult/time-sensitive of a task is it going to be to incorporate OpenCL? With that flag, the gpu does get some kind of signal because utilization increases. Just wondering if it’s a very involved process, or if we just need to copy/paste something from llama and/or ggml? |
I'm no expert in opencl but it will require some time, it's not just a copy/paste. The good news is that with the current ram usage the gpu acceleration will probably be one of the more memory efficient. |
@daniandtheweb Can you tell me broadly speaking what tasks need to be completed, like I’m a 5? Maybe CodeLlama can help me contribute a pull request to get it done, but I need a thread to grab onto. |
As I told you I don't know a lot about how the OpenCL implementation works but you probably have to implement each computing kernel of the stock cpu code in opencl. You can take a look at llama.cpp's implementation but you will need to make lots of tweaks to the code to make it work with this project. |
No problem, hold my beer. <<only really knows python>> |
I can confirm that really work! |
I applied the patch and then added some ifdef SD_USE_CLBLAST include "ggml-opencl.h" ... etc, edited cmakelist file with bits from llama.cpp's clblast ported over and renamed/re-pointed, then configured with cmake .. -DGGML_OPENBLAS=ON -DGGML_CLBLAST=ON. Now compiled ./sd recognizes my AMD RX 580 GPU and I get about a 30% speed up. Not a huge increase since that's the same number of CPU threads + GPU, but my GPU is pretty old too. And it does seem take some load off CPU which is nice. Thanks! |
@leejet @Green-Sky @ggerganov
I do not know cpp and do not have a solid grasp on how ggml works. , but building the repo with cmake -dggml_clblast=ON seems to work as the GPU utilization goes up and it’s very fast (10s vs 80s per step on a higher end CPU). It does complete all the steps and completes sampling too, but then crashes at line 1505 of ggml-opencl.
If it is a matter of spending time to make this work, is it simple enough for one of you to explain what needs to be done? If so, would be happy to give it a shot but don’t know where to start.
My limited understanding is that sampling is what takes all the effort, so is there a way to maybe switch from GPU to CPU to save the file? Or am I missing some context/knowledge?
Edit: Fixed typo. Flag used is clblast, not openblas.
The text was updated successfully, but these errors were encountered: