Closed
Description
Name and Version
llama.cpp version: 4490 (adc5dd9)
windows 11 pro
dual AMD Radeon PRO W7800
vulkan SDK version: 1.3.283
Operating systems
Windows
Which llama.cpp modules do you know to be affected?
llama-cli
Command line
git checkout f11cfdfd
cmake -B build-f11cfdfd -DGGML_VULKAN=ON
cmake --build .\build-f11cfdfd\ --config Release
.\build-f11cfdfd\bin\Release\llama-cli.exe -no-cnv -m "C:\Users\User\.cache\lm-studio\models\lmstudio-community\Qwen2.5-14B-Instruct-GGUF\Qwen2.5-14B-Instruct-Q4_K_M.gguf" -ngl 99 --seed 0 --temp 0 -p "<|im_start|>user
>> Tell me a 100 word story<|im_end|>
>> <|im_start|>assistant
>> "
git checkout adc5dd92
cmake -B build-adc5dd92 -DGGML_VULKAN=ON
cmake --build .\build-adc5dd92\ --config Release
.\build-adc5dd92\bin\Release\llama-cli.exe -no-cnv -m "C:\Users\User\.cache\lm-studio\models\lmstudio-community\Qwen2.5-14B-Instruct-GGUF\Qwen2.5-14B-Instruct-Q4_K_M.gguf" -ngl 99 --seed 0 --temp 0 -p "<|im_start|>user
>> Tell me a 100 word story<|im_end|>
>> <|im_start|>assistant
>> "
Problem description & steps to reproduce
See the above commands to reproduce
First Bad Commit
Relevant log output
f11cfdfd:
llama_perf_sampler_print: sampling time = 8.70 ms / 108 runs ( 0.08 ms per token, 12410.94 tokens per second)
llama_perf_context_print: load time = 5022.91 ms
llama_perf_context_print: prompt eval time = 164.06 ms / 17 tokens ( 9.65 ms per token, 103.62 tokens per second)
llama_perf_context_print: eval time = 2136.73 ms / 90 runs ( 23.74 ms per token, 42.12 tokens per second)
llama_perf_context_print: total time = 2320.98 ms / 107 tokens
adc5dd92:
llama_perf_sampler_print: sampling time = 8.74 ms / 108 runs ( 0.08 ms per token, 12356.98 tokens per second)
llama_perf_context_print: load time = 5052.59 ms
llama_perf_context_print: prompt eval time = 164.99 ms / 17 tokens ( 9.71 ms per token, 103.04 tokens per second)
llama_perf_context_print: eval time = 2473.68 ms / 90 runs ( 27.49 ms per token, 36.38 tokens per second)
llama_perf_context_print: total time = 2659.04 ms / 107 tokens
Additional information:
I tested a few other models, and observed degradation in many different architectures for Q4_K_M. Some models I saw that experienced degradation are Qwen2.5 7B, Qwen 2.5 14B, Command R v01. I also have an unconfirmed report of Phi 4 degradation as well. Smaller models such as Qwen2.5 0.5B did not experience degradation