Running llama.cpp on Samsung S25 phone #11977

ajavamind · 2025-02-20T16:01:03Z

ajavamind
Feb 20, 2025

I was able to run llama.cpp on my new Samsung S25 phone using the Termux app based on instructions at https://github.com/ggml-org/llama.cpp/blob/master/docs/android.md

The S25 uses Qualcomm Snapdragon 8 Elite CPU and I built my llama.cpp for CPU mode.
After building I loaded this model:

pwd
/data/data/com.termux/files/home

curl -L https://huggingface.co/bartowski/Phi-3.5-mini-instruct-GGUF/resolve/main/Phi-3.5-mini-instruct-Q4_K_M.gguf -o ~/Phi-3.5-mini-instruct-Q4_K_M.gguf

and run with

./llama.cpp/build/bin/llama-server -m ~/Phi-3.5-mini-instruct-Q4_K_M.gguf -c 16384 --n-gpu-layers 99 --host 10.0.0.172

Note that -c 32768 crashed Termux. I am using the model on my local WiFi network with a PC and tablets.
I am happy with the model performance/speed on the S25 and quality of the model output so far.

Is there anyone working on including (S25) Qualcomm NPU for building Android based llama.cpp?

Thank You.

zhouwg · 2025-02-21T07:18:57Z

zhouwg
Feb 21, 2025

hello, glad to see you here. probably the project you are looking for is ggml-qnn . we are also looking for domain technical experts(familiar with hardcore AI tech or Qualcomm QNN SDK) to join this open source project and work together to build public good for the great llama.cpp community.

btw,

Qualcomm provide an official QNN backend for Meta's executorch ( https://github.com/pytorch/executorch) to deploy LLM on Snapdragon based smartphone;
Qualcomm provide a dedicated tech to deploy LLM on Snapdragon based smartphone:https://www.qualcomm.com/developer/software/ai-model-efficiency-toolkit.

all these tech are both very complicated(in my personal point of view, because there are too much tech encapsulation and we don't know how it works clearly) tech that's the reason of ggml-qnn comes about.

one more important thing, we already know that "Qualcomm engineers have been participating in llama.cpp development for some time now" in this post #8273 (comment) from a Senior Technical Director at Qualcomm. accordingly, might be an official QNN backend for llama.cpp can be seen in the future, that's might be all you need on Snapdragon mobile SoC based Android smartphone.

1 reply

ajavamind Feb 21, 2025
Author

Thanks @zhouwg for your wiki ggml-qnn. The link to Performance of llama.cpp on Snapdragon X Elite/Plus #8273 is very helpful.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Running llama.cpp on Samsung S25 phone #11977

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Running llama.cpp on Samsung S25 phone #11977

ajavamind Feb 20, 2025

Replies: 1 comment · 1 reply

zhouwg Feb 21, 2025

ajavamind Feb 21, 2025 Author

ajavamind
Feb 20, 2025

Replies: 1 comment 1 reply

zhouwg
Feb 21, 2025

ajavamind Feb 21, 2025
Author