Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenCL seems to almost work #48

Open
Happenedtostumblein opened this issue Sep 5, 2023 · 12 comments
Open

OpenCL seems to almost work #48

Happenedtostumblein opened this issue Sep 5, 2023 · 12 comments

Comments

@Happenedtostumblein
Copy link

Happenedtostumblein commented Sep 5, 2023

@leejet @Green-Sky @ggerganov

I do not know cpp and do not have a solid grasp on how ggml works. , but building the repo with cmake -dggml_clblast=ON seems to work as the GPU utilization goes up and it’s very fast (10s vs 80s per step on a higher end CPU). It does complete all the steps and completes sampling too, but then crashes at line 1505 of ggml-opencl.

If it is a matter of spending time to make this work, is it simple enough for one of you to explain what needs to be done? If so, would be happy to give it a shot but don’t know where to start.

My limited understanding is that sampling is what takes all the effort, so is there a way to maybe switch from GPU to CPU to save the file? Or am I missing some context/knowledge?

Edit: Fixed typo. Flag used is clblast, not openblas.

@ggerganov
Copy link
Contributor

Try this patch: ggerganov/llama.cpp@6460f75

@Happenedtostumblein
Copy link
Author

Happenedtostumblein commented Sep 5, 2023

@ggerganov That worked, thank you!

Is it proper protocol to submit a pull request for a one-liner?

Edit: FYI: It allows entire process to complete, but does not actually make use of GPU.

@Happenedtostumblein
Copy link
Author

FYI: It does work, but GPU utilization is very low. Got any more simple speedups in your pocket? @ggerganov

@daniandtheweb
Copy link
Contributor

daniandtheweb commented Sep 5, 2023

I'm sorry to disappoint you but openblas doesn't use the gpu to accelerate the processing but it uses the cpu itself. If anything you should try DGGML-CLBLAST=ON in order to use OpenCL but it still wouldn't work as the developer still hasn't integrated any gpu acceleration into the program.

@Happenedtostumblein
Copy link
Author

@daniandtheweb Thanks for pointing that out…it was a typo and the CLBLAST flag is what I was referring to.

How difficult/time-sensitive of a task is it going to be to incorporate OpenCL? With that flag, the gpu does get some kind of signal because utilization increases.

Just wondering if it’s a very involved process, or if we just need to copy/paste something from llama and/or ggml?

@daniandtheweb
Copy link
Contributor

I'm no expert in opencl but it will require some time, it's not just a copy/paste. The good news is that with the current ram usage the gpu acceleration will probably be one of the more memory efficient.

@Happenedtostumblein
Copy link
Author

@daniandtheweb Can you tell me broadly speaking what tasks need to be completed, like I’m a 5?

Maybe CodeLlama can help me contribute a pull request to get it done, but I need a thread to grab onto.
(Not sure if tagging is necessary, new to Github)

@daniandtheweb
Copy link
Contributor

As I told you I don't know a lot about how the OpenCL implementation works but you probably have to implement each computing kernel of the stock cpu code in opencl. You can take a look at llama.cpp's implementation but you will need to make lots of tweaks to the code to make it work with this project.

@Happenedtostumblein
Copy link
Author

Happenedtostumblein commented Sep 6, 2023

No problem, hold my beer.

<<only really knows python>>

@FNsi
Copy link

FNsi commented Sep 24, 2023

Try this patch: ggerganov/llama.cpp@6460f75

I can confirm that really work!

@rayrayraykk
Copy link

rayrayraykk commented Nov 10, 2023

@leejet @Green-Sky @ggerganov

I do not know cpp and do not have a solid grasp on how ggml works. , but building the repo with cmake -dggml_clblast=ON seems to work as the GPU utilization goes up and it’s very fast (10s vs 80s per step on a higher end CPU). It does complete all the steps and completes sampling too, but then crashes at line 1505 of ggml-opencl.

If it is a matter of spending time to make this work, is it simple enough for one of you to explain what needs to be done? If so, would be happy to give it a shot but don’t know where to start.

My limited understanding is that sampling is what takes all the effort, so is there a way to maybe switch from GPU to CPU to save the file? Or am I missing some context/knowledge?

Edit: Fixed typo. Flag used is clblast, not openblas.

Use OpenCL on Android, and it gets slower. What device are you using?
image

@superkuh
Copy link

superkuh commented Dec 26, 2023

I applied the patch and then added some ifdef SD_USE_CLBLAST include "ggml-opencl.h" ... etc, edited cmakelist file with bits from llama.cpp's clblast ported over and renamed/re-pointed, then configured with cmake .. -DGGML_OPENBLAS=ON -DGGML_CLBLAST=ON. Now compiled ./sd recognizes my AMD RX 580 GPU and I get about a 30% speed up. Not a huge increase since that's the same number of CPU threads + GPU, but my GPU is pretty old too. And it does seem take some load off CPU which is nice. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants