Eval bug: ~~Q2_K and Q3_K~~ Q8_0 not working on Vulkan anymore on RX 5700XT

### Name and Version

ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = AMD Radeon RX 5700 XT (AMD proprietary driver) | uma: 0 | fp16: 1 | warp size: 64 | shared memory: 32768 | matrix cores: none
version: 4820 (1a24c462)
built with MSVC 19.42.34435.0 for x64

### Operating systems

Windows

### GGML backends

Vulkan

### Hardware

Ryzen 5900X + RX 5700 XT

### Models
 
Any model that has Q8_0 tensors in it.

### Problem description & steps to reproduce

Complete gibberish/noise output.

I noticed this issue with stable-diffusion.cpp at first, but I can reproduce it here. 

To reproduce, simply start inference with any q8_0 model, with `-ngl` set to anything but 0.

### First Bad Commit

fbeda90 (#12015)

### Relevant log output
Example command:

`.\build\bin\Release\llama-cli.exe -m .\models\gemma-2b-Q8_0.gguf -no-cnv -ngl 19 -t 6 -tb 12 -p "The meaning of life is"`

Output:

```
 The meaning of life is increa increa increa increa increa increa increa increa increa increa increa increa increa increa increa increa increa increa increa increa increa increa increa increa increa increa increa increa increa increa increa increa increa increa increa increa increa increa increa increa increa increa increa increa increa increa increa increa increa increa increa increa increa increa increa increa increa increa increa increa increa increa increa increa increa increa increa increa increa increa increa increa increa increa increa increa increa increa increa increa increa
llama_perf_sampler_print:    sampling time =      13.57 ms /    87 runs   (    0.16 ms per token,  6410.26 tokens per second)
llama_perf_context_print:        load time =    2080.08 ms
llama_perf_context_print: prompt eval time =      23.16 ms /     6 tokens (    3.86 ms per token,   259.09 tokens per second)
llama_perf_context_print:        eval time =     879.56 ms /    80 runs   (   10.99 ms per token,    90.95 tokens per second)
llama_perf_context_print:       total time =     936.59 ms /    86 tokens
Interrupted by user
```
Reverting fbeda90 fixes it.

<details>
<summary>Older q2_k/q3_k related issue (fixed by adc5dd92e8aea98f5e7ac84f6e1bc15de35130b5 #11081 )</summary>

### Name and Version

ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = AMD Radeon RX 5700 XT (AMD proprietary driver) | uma: 0 | fp16: 1 | warp size: 64
version: 4277 (c5ede3849)
built with MSVC 19.41.34120.0 for x64

### Operating systems

Windows

### GGML backends

Vulkan

### Hardware

Ryzen 5900X + RX 5700 XT

### Models
 
Any model that has Q3_K or Q2_K tensors in it.

### Problem description & steps to reproduce

Complete gibberish/noise output.

I noticed this issue with stable-diffusion.cpp at first, but I can reproduce it here. 

To reproduce, simply start inference with any q3_k_x or q2_k_x model, with `-ngl` set to anything but 0.

### First Bad Commit

4a57d36 (https://github.com/ggerganov/llama.cpp/pull/10459)

### Relevant log output
Example command:

`.\build\bin\Release\llama-cli.exe -m .\models\Mistral-7B-v0.2-hf-Q3_K_L.gguf -ngl 24 -t 6 -tb 12 -p "The meaning of life is"`

Output:

```
The meaning of life is to- kur m jel ul tawa Computkow Ydorfico oobeckagles “anga ACenzei Roose Asto__(ingle Phillieraspace TheFAILEDello securózannieloilloemente GabrielóniałrivatemulticolManocaluckangle>@‑inghamulle pagina Steinentoadyodenzes Armindowtexlä v Ronald incre bioExitocyniadelphiaumper globutescison sear lifestyle proto Kotiek po cadutes Eng randCl byaginganziagedrafla cad- extern met  externward Kyere collectenteryenta‎ divisionsExternaleryy Aubore2� Yale randomirkFBimanneman hyd BrowFB Maj Majalaky audanning Ex ternal -neylitter Intentanningky amaperlDsek  Britats unit andraportyo am… Egyptian portionandraandeentob – indirectibaentoicigeb associate1田 ##icijays Lyiana auditentoawPy import Girapy TheMky X Himery  departmentyyyiba1iba indirect n #isterschaftciProrico Industrial #aniric Palm indirectBici patPyy –hetriky ### AtlantaidleBazialaaran Mediterranean matter sl m South experekylie------ofsy Meyainsottoannedento- corporBOestic /******/entopythonats eternainsalian Gir expery # Sar‟eloalfentaahaelfonomPal rigidento bon bon Pdas palanda P Muhammadentoít SubPy ###GAentoeterenta Palm Kabâ Cecenta8entonuoltyBotaueraperendlento Ec pyento externâ accentburgaper Klaly
llama_perf_sampler_print:    sampling time =      12.93 ms /   319 runs   (    0.04 ms per token, 24665.58 tokens per second)
llama_perf_context_print:        load time =    3158.38 ms
llama_perf_context_print: prompt eval time =     262.98 ms /     6 tokens (   43.83 ms per token,    22.82 tokens per second)
llama_perf_context_print:        eval time =   13525.70 ms /   312 runs   (   43.35 ms per token,    23.07 tokens per second)
llama_perf_context_print:       total time =   13823.14 ms /   318 tokens
Interrupted by user
```
Reverting 4a57d36 fixes it.
</details>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Eval bug: Q2_K and Q3_K Q8_0 not working on Vulkan anymore on RX 5700XT #10710

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Eval bug: ~~Q2_K and Q3_K~~ Q8_0 not working on Vulkan anymore on RX 5700XT #10710

Description

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Eval bug: Q2_K and Q3_K Q8_0 not working on Vulkan anymore on RX 5700XT #10710