Vulkan frontend breaking and causing garbled output after around 500 tokens, and causing GPU to crash. #5179

gnawzie · 2024-01-29T01:02:22Z

AMD 6800xt GPU, 32GB system RAM.
Arch Linux

Vulkan works fast for the good text it outputs, but results in garbled output past some number of tokens.

./server --n-gpu-layers 46 --model models/LLaMA2-13B-Psyfighter2.Q5_K_M.gguf

Here is a sample of text near the point of failure.

We have faced many trials in our past: wars fought for principles that seemed long forgotten; struggles against oppression at home and abroad; movements for civil rights that shook the very foundations of society. Through it all, we emerged stronger and more unified than ever before, for we knew deep down inside that together, there was nothing we could not achieve.

And so, my friends, as we stand on the precipice of a new era; an age defined by technological advancement, global interconnectedness, and unprecedented social progressadvраб operationitelkeltc ScherobatiotviksochitelockUBtc/(ховgenceariestcwigotaetwork Hopété Nasleratimer(&лет#

It also caused my GPU to crash on a previous boot with these error messages.

amdgpu 0000:0b:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:24 vmid:5 pasid:3
2773, for process server pid 27706 thread server pid 27706)
kernel: amdgpu 0000:0b:00.0: amdgpu:   in page starting at address 0x0000000000001000 fro
m client 0x1b (UTCL2)
kernel: amdgpu 0000:0b:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00501E31
kernel: amdgpu 0000:0b:00.0: amdgpu:          Faulty UTCL2 client ID: GCR (0xf)
kernel: amdgpu 0000:0b:00.0: amdgpu:          MORE_FAULTS: 0x1
kernel: amdgpu 0000:0b:00.0: amdgpu:          WALKER_ERROR: 0x0
kernel: amdgpu 0000:0b:00.0: amdgpu:          PERMISSION_FAULTS: 0x3
kernel: amdgpu 0000:0b:00.0: amdgpu:          MAPPING_ERROR: 0x0

I hope this helps. Thanks!

The text was updated successfully, but these errors were encountered:

LostRuins · 2024-01-29T10:17:55Z

I have a feeling that it is triggered after a context shift takes place llama_kv_cache_seq_rm + llama_kv_cache_seq_shift . Mine always seems to segfault around that point. Does not happen if 0 GPU layers are offloaded.

teleprint-me · 2024-01-29T15:53:59Z

Which precision are you using? E.g. f16, q8_0, etc?

0cc4m · 2024-01-29T21:11:20Z

I can reproduce this. I'll try to debug and fix it.

Edit: I can reproduce the garbled output and a segfault, the GPU crash is your driver's fault.

gnawzie added the bug-unconfirmed label Jan 29, 2024

teleprint-me mentioned this issue Jan 29, 2024

Vulkan Implementation #2059

Merged

0cc4m self-assigned this Jan 29, 2024

0cc4m added bug Something isn't working and removed bug-unconfirmed labels Jan 29, 2024

LostRuins mentioned this issue Jan 30, 2024

[VULKAN] Segmentation fault when Context Shifting erase tokens LostRuins/koboldcpp#647

Closed

0cc4m mentioned this issue Jan 30, 2024

Vulkan Fixes #5223

Merged

0cc4m closed this as completed in #5223 Jan 31, 2024

llfw mentioned this issue Jun 6, 2024

vulkan: garbage output followed by GPU crash LostRuins/koboldcpp#897

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vulkan frontend breaking and causing garbled output after around 500 tokens, and causing GPU to crash. #5179

Vulkan frontend breaking and causing garbled output after around 500 tokens, and causing GPU to crash. #5179

gnawzie commented Jan 29, 2024

LostRuins commented Jan 29, 2024

teleprint-me commented Jan 29, 2024

0cc4m commented Jan 29, 2024 •

edited

Loading

Vulkan frontend breaking and causing garbled output after around 500 tokens, and causing GPU to crash. #5179

Vulkan frontend breaking and causing garbled output after around 500 tokens, and causing GPU to crash. #5179

Comments

gnawzie commented Jan 29, 2024

LostRuins commented Jan 29, 2024

teleprint-me commented Jan 29, 2024

0cc4m commented Jan 29, 2024 • edited Loading

0cc4m commented Jan 29, 2024 •

edited

Loading