Replies: 1 comment 1 reply
-
hello, glad to see you here. probably the project you are looking for is ggml-qnn . we are also looking for domain technical experts(familiar with hardcore AI tech or Qualcomm QNN SDK) to join this open source project and work together to build public good for the great llama.cpp community. btw,
all these tech are both very complicated(in my personal point of view, because there are too much tech encapsulation and we don't know how it works clearly) tech that's the reason of ggml-qnn comes about. one more important thing, we already know that "Qualcomm engineers have been participating in llama.cpp development for some time now" in this post #8273 (comment) from a Senior Technical Director at Qualcomm. accordingly, might be an official QNN backend for llama.cpp can be seen in the future, that's might be all you need on Snapdragon mobile SoC based Android smartphone. |
Beta Was this translation helpful? Give feedback.
-
I was able to run llama.cpp on my new Samsung S25 phone using the Termux app based on instructions at https://github.com/ggml-org/llama.cpp/blob/master/docs/android.md
The S25 uses Qualcomm Snapdragon 8 Elite CPU and I built my llama.cpp for CPU mode.
After building I loaded this model:
pwd
/data/data/com.termux/files/home
curl -L https://huggingface.co/bartowski/Phi-3.5-mini-instruct-GGUF/resolve/main/Phi-3.5-mini-instruct-Q4_K_M.gguf -o ~/Phi-3.5-mini-instruct-Q4_K_M.gguf
and run with
./llama.cpp/build/bin/llama-server -m ~/Phi-3.5-mini-instruct-Q4_K_M.gguf -c 16384 --n-gpu-layers 99 --host 10.0.0.172
Note that -c 32768 crashed Termux. I am using the model on my local WiFi network with a PC and tablets.
I am happy with the model performance/speed on the S25 and quality of the model output so far.
Is there anyone working on including (S25) Qualcomm NPU for building Android based llama.cpp?
Thank You.
Beta Was this translation helpful? Give feedback.
All reactions