Skip to content

Misc. bug: Drastic drop in Vulkan performance somewhere between builds b4916 (was fast) and b4932 (roasting CPU and seems to barely use GPU) #12490

Closed
@sidran

Description

@sidran

Name and Version

ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = AMD Radeon RX 6600 (AMD proprietary driver) | uma: 0 | fp16: 1 | warp size: 32 | shared memory: 32768 | matrix cores: none
version: 4932 (9ffcc9e)
built with MSVC 19.43.34808.0 for x64

Operating systems

Windows

Which llama.cpp modules do you know to be affected?

llama-server

Command line

I just use this .bat to start server and then use just Llama-server's webUI

echo Running Mistral small 3.1 2503 24B 12288 context
llama-server.exe ^
--model "D:\LLMs\mistralai_Mistral-Small-3.1-24B-Instruct-2503-Q4_K_M.gguf" ^
--gpu-layers 14 ^
--ctx-size 12288 ^
--temp 0.2

Problem description & steps to reproduce

Going from release b4916 (llama-b4916-bin-win-vulkan-x64) to b4932 (llama-b4932-bin-win-vulkan-x64) I noticed that model (not just this Mistral) load partially into VRAM just like before but inference is significantly slower and CPU is being roasted. This never happened with multiple previous releases.
Checking VRAM load in task manager is showing that model seems to be properly loaded (exactly like before) but performance is almost like its not using GPU at all. Literally was solid and now is terrible.

First Bad Commit

No response

Relevant log output

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions