Vulkan: Don't default to CPU device (like llvmpipe) #14099

0cc4m · 2025-06-10T11:23:43Z

This should fix containers/ramalama#1479

llvmpipe can still be used by setting GGML_VK_VISIBLE_DEVICES to override automatic device selection. This may be required now to allow the Github CI test-backend-ops to run for Vulkan.

… device is available, to allow fallback to CPU backend

ericcurtin · 2025-06-10T11:35:20Z

This is exactly what we need! Default to llvmpipe was silly (we would even warn that this is probably not what you want to do in the logs).

As an aside will this work with vulkan? Auto-setting ngl for vulkan could be kinda neat:

https://github.com/ggml-org/llama.cpp/pull/14067/files

ericcurtin · 2025-06-10T11:38:47Z

ggml/src/ggml-vulkan/ggml-vulkan.cpp

+        // If only CPU devices are available, return without devices.
+        if (vk_instance.device_indices.empty()) {
+            for (size_t i = 0; i < devices.size(); i++) {
+                if (devices[i].getProperties().deviceType != vk::PhysicalDeviceType::eCpu) {


It's possible we want to consider other device types here too like:

if (devices[i].getProperties().deviceType != vk::PhysicalDeviceType::eCpu && devices[i].getProperties().deviceType != vk::PhysicalDeviceType::eIntegratedGpu)

Most integrated GPUs are slower than CPU inferencing, especially if an Integrated GPU has < 1GB VRAM it gets very questionable.

But could be discussion for another PR...

Actually looking at the various types it can be, I'd flip it:

if (devices[i].getProperties().deviceType == vk::PhysicalDeviceType::eDiscreteGpu)

That is true, but there are a lot of iGPUs that run better than CPU with Vulkan, too. It is not as straightforward to decide here, we might need a black- or whitelist.

Considering we can always override with GGML_VK_VISIBLE_DEVICES, only eDiscreteGpu would get my vote

Curious what's your take @a-ghorbani from the Android perspective

Most integrated GPUs are slower than CPU inferencing, especially if an Integrated GPU has < 1GB VRAM it gets very questionable.

Considering we can always override with GGML_VK_VISIBLE_DEVICES, only eDiscreteGpu would get my vote

In the vast majority of cases an integrated GPU with Vulkan and -ngl 0 is going to perform better than CPU in prompt processing. Also pretty much all computer iGPUs that support Vulkan (so anything newer than Intel Skylake or the AMD GCN2 APUs) should be able to access several GBs of memory no problem. I'll admit I'm not sure about phones though.

If you look at the chart a lot of the newer integrated chips are running very well even with the model fully offloaded. Also Intel, AMD, and Nvidia are beginning to follow Apple by making fast iGPUs with more memory bandwidth.

Now there's a case where the CPU might win for prompt processing though and that's when you have one of those new 16 core AMD Zen 5 CPUs with the little 2 CU iGPU.

ericcurtin · 2025-06-10T11:52:22Z

Looks like we hit a flake

jeffbolznv · 2025-06-10T13:49:44Z

Looks like this did indeed disable our CI coverage?

2025-06-10T11:49:02.1603371Z 28: Test command: /home/runner/work/llama.cpp/llama.cpp/build/bin/test-backend-ops
2025-06-10T11:49:02.1603888Z 28: Working Directory: .
2025-06-10T11:49:02.1604101Z 28: Test timeout computed to be: 3600
2025-06-10T11:49:02.1899893Z 28: ggml_vulkan: No devices found.
2025-06-10T11:49:02.1919320Z 28: Testing 1 devices
2025-06-10T11:49:02.1919657Z 28: 
2025-06-10T11:49:02.1919954Z 28: Backend 1/1: CPU
2025-06-10T11:49:02.1920324Z 28:   Skipping CPU backend
2025-06-10T11:49:02.1920643Z 28: 1/1 backends passed
2025-06-10T11:49:02.1921080Z 28: �[1;32mOK�[0m
2025-06-10T11:49:02.1951995Z 28/33 Test #28: test-backend-ops ..................   Passed    0.03 sec

IMO this needs to be fixed or reverted ASAP.

0cc4m · 2025-06-10T13:56:36Z

I didn't expect it to be merged this quickly, maybe should have set it to draft. But basically you only need to set GGML_VK_VISIBLE_DEVICES=0 to override for the CI, I assume that's not hard to do.

jeffbolznv · 2025-06-10T14:02:57Z

ok, I've made an attempt at #14106 (though I'm not an expert on github workflows)

xcvbnmp · 2025-06-10T14:36:38Z

#14099 هاي شنو

* origin/master: llama : support GEGLU for jina-bert-v2 (ggml-org#14090) vulkan: force device 0 in CI (ggml-org#14106) Fixed spec timings to: accepted/tested instead of accepted/drafted (ggml-org#14104) sync : ggml ggml : fix weak alias win32 (whisper/0) Vulkan: Don't default to CPU device (like llvmpipe), even if no other device is available, to allow fallback to CPU backend (ggml-org#14099) rpc : nicer error messages for RPC server crash (ggml-org#14076) sync : ggml Add in-build ggml::ggml ALIAS library (ggml/1260) metal : use less stack memory in FA kernel (ggml-org#14088) kv-cache : fix shift and defrag logic (ggml-org#14081) llama : allow building all tests on windows when not using shared libs (ggml-org#13980)

Vulkan: Don't default to CPU device (like llvmpipe), even if no other…

52e6788

… device is available, to allow fallback to CPU backend

github-actions bot added Vulkan Issues specific to the Vulkan backend ggml changes relating to the ggml tensor library for machine learning labels Jun 10, 2025

0cc4m mentioned this pull request Jun 10, 2025

Performance regression between 0.7.4 and 0.9.0 containers/ramalama#1479

Closed

ericcurtin approved these changes Jun 10, 2025

View reviewed changes

ericcurtin reviewed Jun 10, 2025

View reviewed changes

ericcurtin approved these changes Jun 10, 2025

View reviewed changes

ericcurtin merged commit 97340b4 into master Jun 10, 2025
43 of 47 checks passed

ericcurtin deleted the 0cc4m/vulkan-disable-cpu-device branch June 10, 2025 12:01

ericcurtin mentioned this pull request Jun 10, 2025

Only enumerate ROCm-capable AMD GPUs containers/ramalama#1500

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Vulkan: Don't default to CPU device (like llvmpipe) #14099

Vulkan: Don't default to CPU device (like llvmpipe) #14099

Uh oh!

0cc4m commented Jun 10, 2025

Uh oh!

ericcurtin commented Jun 10, 2025

Uh oh!

ericcurtin Jun 10, 2025

Uh oh!

ericcurtin Jun 10, 2025

Uh oh!

0cc4m Jun 10, 2025

Uh oh!

ericcurtin Jun 10, 2025

Uh oh!

ericcurtin Jun 10, 2025

Uh oh!

netrunnereve Jun 10, 2025

Uh oh!

ericcurtin commented Jun 10, 2025

Uh oh!

Uh oh!

jeffbolznv commented Jun 10, 2025

Uh oh!

0cc4m commented Jun 10, 2025

Uh oh!

jeffbolznv commented Jun 10, 2025

Uh oh!

xcvbnmp commented Jun 10, 2025

Uh oh!

Uh oh!

Vulkan: Don't default to CPU device (like llvmpipe) #14099

Vulkan: Don't default to CPU device (like llvmpipe) #14099

Uh oh!

Conversation

0cc4m commented Jun 10, 2025

Uh oh!

ericcurtin commented Jun 10, 2025

Uh oh!

ericcurtin Jun 10, 2025

Choose a reason for hiding this comment

Uh oh!

ericcurtin Jun 10, 2025

Choose a reason for hiding this comment

Uh oh!

0cc4m Jun 10, 2025

Choose a reason for hiding this comment

Uh oh!

ericcurtin Jun 10, 2025

Choose a reason for hiding this comment

Uh oh!

ericcurtin Jun 10, 2025

Choose a reason for hiding this comment

Uh oh!

netrunnereve Jun 10, 2025

Choose a reason for hiding this comment

Uh oh!

ericcurtin commented Jun 10, 2025

Uh oh!

Uh oh!

jeffbolznv commented Jun 10, 2025

Uh oh!

0cc4m commented Jun 10, 2025

Uh oh!

jeffbolznv commented Jun 10, 2025

Uh oh!

xcvbnmp commented Jun 10, 2025

Uh oh!

Uh oh!