Misc. bug: Vulkan performance depends on thread priority

### Name and Version

ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = NVIDIA GeForce RTX 4070 (NVIDIA) | uma: 0 | fp16: 1 | warp size: 32 | shared memory: 49152 | int dot: 1 | matrix cores: NV_coopmat2
version: 5143 (b43d89e3)
built with MSVC 19.35.32217.1 for x64

### Operating systems

Windows

### Which llama.cpp modules do you know to be affected?

llama-bench

### Command line

```shell
llama-bench.exe -m C:\models\llama-2-7b.Q4_0.gguf -p 0 -n 128,128,128
llama-bench.exe -m C:\models\llama-2-7b.Q4_0.gguf -p 0 -n 128,128,128 --prio 1
```

### Problem description & steps to reproduce

I've noticed recently that ggml-vulkan performance depends more on thread/process priority than expected. For example, comparing normal priority to above_normal:

```
Z:\github\jeffbolznv\llama.cpp\build\bin\RelWithDebInfo>llama-bench.exe -m C:\models\llama-2-7b.Q4_0.gguf -p 0 -n 128,128,128
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = NVIDIA GeForce RTX 4070 (NVIDIA) | uma: 0 | fp16: 1 | warp size: 32 | shared memory: 49152 | int dot: 1 | matrix cores: NV_coopmat2
| model                          |       size |     params | backend    | ngl |          test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ------------: | -------------------: |
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | Vulkan     |  99 |         tg128 |         95.62 ± 1.50 |
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | Vulkan     |  99 |         tg128 |         95.18 ± 1.40 |
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | Vulkan     |  99 |         tg128 |         94.45 ± 0.74 |

Z:\github\jeffbolznv\llama.cpp\build\bin\RelWithDebInfo>llama-bench.exe -m C:\models\llama-2-7b.Q4_0.gguf -p 0 -n 128,128,128 --prio 1
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = NVIDIA GeForce RTX 4070 (NVIDIA) | uma: 0 | fp16: 1 | warp size: 32 | shared memory: 49152 | int dot: 1 | matrix cores: NV_coopmat2
| model                          |       size |     params | backend    | ngl |          test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ------------: | -------------------: |
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | Vulkan     |  99 |         tg128 |         99.95 ± 0.08 |
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | Vulkan     |  99 |         tg128 |         99.98 ± 0.25 |
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | Vulkan     |  99 |         tg128 |        100.30 ± 0.14 |
```

Performance is also noticeably more variable with default priority. I think this is related to CPU latency after waiting on a fence, and I thought I had improved this with https://github.com/ggml-org/llama.cpp/pull/12630, but it seems to be back and I don't understand why. I don't think it's related to driver version. I kind of suspect OS update, but I'm not sure.

I'd like to crowdsource some data on what systems this affects. If folks could please try these or similar command lines, and report CPU, GPU, driver version, and OS version (for windows please run `winver` and report the OS build number), I'd appreciate it. These results are on core i9-14900k, RTX 4070, driver 576.02, Windows 11 24H2 OS Build 26100.3775.

### First Bad Commit

_No response_

### Relevant log output

```shell

```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Misc. bug: Vulkan performance depends on thread priority #12976

Name and Version

Operating systems

Which llama.cpp modules do you know to be affected?

Command line

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Misc. bug: Vulkan performance depends on thread priority #12976

Description

Name and Version

Operating systems

Which llama.cpp modules do you know to be affected?

Command line

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions