Closed
Description
Hi,
easy to replicate forcing it by disabling coop mat 1 and 2 code paths:
set GGML_VK_DISABLE_COOPMAT2=1
set GGML_VK_DISABLE_COOPMAT=1
tok/s go from 2899 in build 5010 (first with integer dot product usage?) to 1935.29 in build 5145..
EDIT: lazy to bisect in which build/commit perf regressed..
using latest Nvidia drivers both 575.xx branch and Nv VK dev driver..
llama-b5010-bin-win-vulkan-x64>llama-bench
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = NVIDIA GeForce RTX 4070 (NVIDIA) | uma: 0 | fp16: 1 | warp size: 32 | shared memory: 49152 | int dot: 1 | matrix cores: none
| model | size | params | backend | ngl | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ------------: | -------------------: |
| llama 7B Q4_0 | 3.56 GiB | 6.74 B | Vulkan,RPC | 99 | pp512 | 2899.14 ± 48.84 |
| llama 7B Q4_0 | 3.56 GiB | 6.74 B | Vulkan,RPC | 99 | tg128 | 100.46 ± 0.49 |
build: a8a1f335 (5010)
llama-b5145-bin-win-vulkan-x64>llama-bench
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = NVIDIA GeForce RTX 4070 (NVIDIA) | uma: 0 | fp16: 1 | warp size: 32 | shared memory: 49152 | int dot: 1 | matrix cores: none
| model | size | params | backend | ngl | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ------------: | -------------------: |
| llama 7B Q4_0 | 3.56 GiB | 6.74 B | Vulkan,RPC | 99 | pp512 | 1935.29 ± 4.16 |
| llama 7B Q4_0 | 3.56 GiB | 6.74 B | Vulkan,RPC | 99 | tg128 | 101.61 ± 0.21 |
build: 12b17501 (5145)
tested also on Linux results equally badly..
export GGML_VK_DISABLE_COOPMAT=1
export GGML_VK_DISABLE_COOPMAT2=1
~/llamavk/lin5010$ ./llama-bench
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = NVIDIA GeForce RTX 4070 (NVIDIA) | uma: 0 | fp16: 1 | warp size: 32 | shared memory: 49152 | int dot: 1 | matrix cores: none
| model | size | params | backend | ngl | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ------------: | -------------------: |
| llama 7B Q4_0 | 3.56 GiB | 6.74 B | Vulkan | 99 | pp512 | 2953.53 ± 8.11 |
| llama 7B Q4_0 | 3.56 GiB | 6.74 B | Vulkan | 99 | tg128 | 98.85 ± 0.27 |
build: a8a1f335 (5010)
~/llamavk/lin5145$ ./llama-bench
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = NVIDIA GeForce RTX 4070 (NVIDIA) | uma: 0 | fp16: 1 | warp size: 32 | shared memory: 49152 | int dot: 1 | matrix cores: none
| model | size | params | backend | ngl | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ------------: | -------------------: |
| llama 7B Q4_0 | 3.56 GiB | 6.74 B | Vulkan | 99 | pp512 | 1926.43 ± 5.12 |
| llama 7B Q4_0 | 3.56 GiB | 6.74 B | Vulkan | 99 | tg128 | 99.65 ± 1.92 |
build: 12b17501 (5145)
Metadata
Metadata
Assignees
Labels
No labels