Big performance regression of llama-bench with Vulkan backend using forced integer dot product code path (at least on NV 4070 latest driver) (from initial support in b5010)..

Hi,
easy to replicate forcing it by disabling coop mat 1 and 2 code paths:
```
set GGML_VK_DISABLE_COOPMAT2=1
set GGML_VK_DISABLE_COOPMAT=1
```

tok/s go from 2899 in build 5010 (first with integer dot product usage?) to 1935.29 in build 5145..

EDIT: lazy to bisect in which build/commit perf regressed..

using latest Nvidia drivers both 575.xx branch and Nv VK dev driver..

```
llama-b5010-bin-win-vulkan-x64>llama-bench
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = NVIDIA GeForce RTX 4070 (NVIDIA) | uma: 0 | fp16: 1 | warp size: 32 | shared memory: 49152 | int dot: 1 | matrix cores: none
| model                          |       size |     params | backend    | ngl |          test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ------------: | -------------------: |
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | Vulkan,RPC |  99 |         pp512 |      2899.14 ± 48.84 |
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | Vulkan,RPC |  99 |         tg128 |        100.46 ± 0.49 |

build: a8a1f335 (5010)
```

```
llama-b5145-bin-win-vulkan-x64>llama-bench
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = NVIDIA GeForce RTX 4070 (NVIDIA) | uma: 0 | fp16: 1 | warp size: 32 | shared memory: 49152 | int dot: 1 | matrix cores: none
| model                          |       size |     params | backend    | ngl |          test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ------------: | -------------------: |
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | Vulkan,RPC |  99 |         pp512 |       1935.29 ± 4.16 |
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | Vulkan,RPC |  99 |         tg128 |        101.61 ± 0.21 |

build: 12b17501 (5145)
```


tested also on Linux results equally badly..

```
export GGML_VK_DISABLE_COOPMAT=1
export GGML_VK_DISABLE_COOPMAT2=1
~/llamavk/lin5010$ ./llama-bench
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = NVIDIA GeForce RTX 4070 (NVIDIA) | uma: 0 | fp16: 1 | warp size: 32 | shared memory: 49152 | int dot: 1 | matrix cores: none
| model                          |       size |     params | backend    | ngl |          test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ------------: | -------------------: |
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | Vulkan     |  99 |         pp512 |       2953.53 ± 8.11 |
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | Vulkan     |  99 |         tg128 |         98.85 ± 0.27 |

build: a8a1f335 (5010)

~/llamavk/lin5145$ ./llama-bench
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = NVIDIA GeForce RTX 4070 (NVIDIA) | uma: 0 | fp16: 1 | warp size: 32 | shared memory: 49152 | int dot: 1 | matrix cores: none
| model                          |       size |     params | backend    | ngl |          test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ------------: | -------------------: |
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | Vulkan     |  99 |         pp512 |       1926.43 ± 5.12 |
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | Vulkan     |  99 |         tg128 |         99.65 ± 1.92 |

build: 12b17501 (5145)
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Big performance regression of llama-bench with Vulkan backend using forced integer dot product code path (at least on NV 4070 latest driver) (from initial support in b5010).. #13063

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Big performance regression of llama-bench with Vulkan backend using forced integer dot product code path (at least on NV 4070 latest driver) (from initial support in b5010).. #13063

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions