-
Notifications
You must be signed in to change notification settings - Fork 9.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tidy Android Instructions README.md #7016
Conversation
Remove CLBlast instructions(outdated), added OpenBlas.
Added apt install git, so that git clone works
Is OpenBLAS actually worth using in Android? For quantized models, it may be faster without it. Ultimately though, without the OpenCL instructions, this basically looks like "install termux and follow the normal build instructions for linux". So maybe it would be simpler that way. |
I like leaving the decision to the user if
Agreed. |
Linked to Linux build instructions
I build with OpenBLAS on Android, not that it matters. My chiming is, unfortunately, anecdotal. Is it really negligible? It's more difficult to tell on the phone if I'm being honest. |
The easiest way to tell if OpenBLAS helps would be to run |
CPU is definitely faster with quants on my device:
CPU:
|
I had to update, fix the convert script by adding the hash, and the upload the model I use, rebuild, and then download the quant. Plus, I have a bunch of other scripts running, so I'll post once it's all set. |
CPU is much faster! Why is that?
|
I think |
Co-authored-by: slaren <slarengh@gmail.com>
Co-authored-by: slaren <slarengh@gmail.com>
Fdroid is not required Co-authored-by: slaren <slarengh@gmail.com>
Thank you. I'll try various options and post results later. |
Co-authored-by: slaren <slarengh@gmail.com>
Tested Here's some quick numbers, loading from shared:
load from shared &
load from
load from
Based on these figures, |
* Tidy Android Instructions README.md Remove CLBlast instructions(outdated), added OpenBlas. * don't assume git is installed Added apt install git, so that git clone works * removed OpenBlas Linked to Linux build instructions * fix typo Remove word "run" * correct style Co-authored-by: slaren <slarengh@gmail.com> * correct grammar Co-authored-by: slaren <slarengh@gmail.com> * delete reference to Android API * remove Fdroid reference, link directly to Termux Fdroid is not required Co-authored-by: slaren <slarengh@gmail.com> * Update README.md Co-authored-by: slaren <slarengh@gmail.com> --------- Co-authored-by: slaren <slarengh@gmail.com>
Tested with TinyLlama-1.1B-Chat-v1.0-Q8_0.gguf using Load from shared,
Load from
The results are near identical. Probably Tiny Llama (1.09 GiB) is too small to emphasize difference for this test, even mmap made no difference. I'll leave larger model benching for someone with a better device than mine. |
* Tidy Android Instructions README.md Remove CLBlast instructions(outdated), added OpenBlas. * don't assume git is installed Added apt install git, so that git clone works * removed OpenBlas Linked to Linux build instructions * fix typo Remove word "run" * correct style Co-authored-by: slaren <slarengh@gmail.com> * correct grammar Co-authored-by: slaren <slarengh@gmail.com> * delete reference to Android API * remove Fdroid reference, link directly to Termux Fdroid is not required Co-authored-by: slaren <slarengh@gmail.com> * Update README.md Co-authored-by: slaren <slarengh@gmail.com> --------- Co-authored-by: slaren <slarengh@gmail.com>
Hey everyone, As the original author of these README instructions, I have to admit that I now see how they might cause more confusion than clarity. Just to clarify for future users: I've personally found CLBlast to be quite effective when used with llama.cpp, especially for certain model families like StableLM and OpenLlama (provided you're not offloading layers). In my experience, it has boosted prompt processing speed by roughly 40%. However, it's important to note that while CLBlast does offer significant speed improvements, it's plagued by bugs. For many model families, or even within the aforementioned subsets when offloading layers, it tends to produce nonsensical output. This is disappointing, considering the untapped potential of the GPUs nestled within our smartphones. If there's any way I can assist, I'd like to offer a few insights based on my experimentation:
Here's hoping that Vulkan proves to be a more robust solution than OpenGL. |
Did your CLBLAST experience involve running corresponding tunners to achive speed for your device ? |
No, I have not tried the tunners yet. Good idea, it's a nice experiment to do. Thanks for the idea! |
Is this specific to Android builds or can be reproduced on PC too? |
As far as I know, it only happens during Android builds. All my tests were conducted with Adreno GPUs from Snapdragon. |
It's better to tidy readme regarding
CLBlast
instructions for Android.Removed CLBlast instructions(outdated). Simplified Android CPU Build instructions.