Accelerated computations on Android Adreno 740 #17456

Elettrotecnica · 2025-11-23T22:49:41Z

Elettrotecnica
Nov 23, 2025

I am trying to run llama.cpp on a Pico 4 Ultra device, which comes with a Snapdragon XR2 Gen 2. Because I am on Android, I am using Termux as linux enironment, which packages llama-cpp and both the opencl and the vulkan backend.

I have tried using the vulkan backend, but I believe I have encountered the problem described in #16881, e.g. the output is (very fast) gibberish.

OpenCL on the other hand does not even activate, because apparently the device does not support some extension llama.cpp needs.

This is the output of llama-cli --list-devices --gpus:

`ˋˋ
ggml_opencl: selected platform: 'clvk'

ggml_opencl: device: 'Turnip Adreno (TM) 740v3 (OpenCL 3.0 CLVK on Vulkan v1.4.328 driver 104869888)'
ggml_opencl: OpenCL driver: 3.0 CLVK on Vulkan v1.4.328 driver 104869888
ggml_opencl: vector subgroup broadcast support: false
ggml_opencl: device FP16 support: true
ggml_opencl: device does not support subgroups (cl_khr_subgroups or cl_intel_subgroups) (note that subgroups is an optional feature in OpenCL 3.0)
ggml_opencl: drop unsupported device.

ggml_opencl: device: 'llvmpipe (LLVM 21.1.5, 128 bits) (OpenCL 3.0 CLVK on Vulkan v1.4.328 driver 104869888)'
Unsupported GPU: llvmpipe (LLVM 21.1.5, 128 bits)
ggml_opencl: drop unsupported device.
load_backend: loaded OpenCL backend from /data/data/com.termux/files/usr/bin/../lib/libggml-opencl.so
load_backend: loaded CPU backend from /data/data/com.termux/files/usr/bin/../lib/libggml-cpu.so
Available devices:
`ˋˋ

While this is the result of clinfo:
clinfo.txt

It seems at least some subgroup operations are supported.

I have tried disabling the check for subgroup features in the code and recompiling, but it seems the check is right and something is actually missing, as the model will fail to load as something won't be computed correctly (not really an OpenCL expert...). I have also tried to use the compilation flag GGML_OPENCL_USE_ADRENO_KERNELS=OFF to see if this would avoid certain kernel operations, but the result is the same.

My question now is: am I facing the limits of current gpu support in termux, e.g. drivers are lacking, or is the chip actually incapable of performing these operations? What else could I try?

chrockjyy · 2026-02-21T08:06:00Z

chrockjyy
Feb 21, 2026

我觉得740理论上适配，大概率是clvk转的时候有问题。可以直接使用ndk试试。我的设备是Lenovo y700 8gen3(sm8650p，GPU是adreno 750)，Android16，kernel为Android14，physical memory是12gb。

我在wsl debian上使用了android-ndk-n27d进行编译llama.cpp，开启了opencl后端，参数如下。

cmake -B build
-DCMAKE_BUILD_TYPE=Release
-DCMAKE_TOOLCHAIN_FILE="/home/nut/android-ndk-r27d/build/cmake/android.toolchain.cmake"
-DANDROID_ABI=arm64-v8a
-DANDROID_PLATFORM=android-35
-DANDROID_STL=c++_static
-DBUILD_SHARED_LIBS=OFF
-DGGML_OPENCL=ON
-DGGML_OPENCL_USE_ADRENO_KERNELS=ON
-DGGML_OPENCL_EMBED_KERNELS=ON
-DGGML_OPENMP=OFF

在push到手机上后可以在设备的shell里正常运行，但是效率感觉不高，可能我的操作不那么正确。
qwen3-7b-q4_0的模型只有6.3tokens（全部层加载到GPU），GPU占用>95%，使用纯CPU运行时经过优化也可以达到这个水平。

或者也可以直接使用vulkan试试，我在termux中使用turnip vulkan驱动跑通过qwen3-8b-q4_k_m，但是也是只有7左右的tokens。

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Accelerated computations on Android Adreno 740 #17456

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Accelerated computations on Android Adreno 740 #17456

Uh oh!

Uh oh!

Elettrotecnica Nov 23, 2025

Replies: 1 comment

Uh oh!

chrockjyy Feb 21, 2026

Elettrotecnica
Nov 23, 2025

chrockjyy
Feb 21, 2026