Description
Name and Version
(gguf) ➜ llama.cpp git:(master) ./build/bin/llama-cli --version
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: yes
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 2 CUDA devices:
Device 0: NVIDIA GeForce RTX 4060 Ti, compute capability 8.9, VMM: yes
Device 1: NVIDIA GeForce RTX 3060, compute capability 8.6, VMM: yes
version: 4564 (acd38ef)
built with cc (Debian 12.2.0-14) 12.2.0 for x86_64-linux-gnu
Operating systems
Linux
Which llama.cpp modules do you know to be affected?
llama-cli
Command line
./build/bin/llama-cli -t 6 --color --interactive --conversation --multiline-input --mirostat 2 --ctx-size 16384 --keep -1 --flash-attn --repeat-penalty 1.2 --n-gpu-layers 44 --temp 0.3 --cache-type-v q8_0 --cache-type-k q8_0
Problem description & steps to reproduce
Suspect: '--cache-type-v q8_0 --cache-type-k q8_0' might be responsible.
Problem: during the chat, eventually this happens (output from cli, no further details):
[1] 12457 segmentation fault (core dumped) ./build/bin/llama-cli
$ coredumpctl list
Mon 2025-01-27 20:48:06 IST 12457 1000 1000 SIGSEGV present /work/src/llama.cpp/build/bin/llama-cli 1.2G
From core dump, call stack:
Stack trace of thread 12474:
#0 0x00007f128f771aab n/a (libc.so.6 + 0x152aab)
#1 0x00007f128f5c68af ggml_graph_compute_thread.isra.0 (libggml-cpu.so + 0x5a8af)
#2 0x00007f128f540d8e n/a (libgomp.so.1 + 0x1cd8e)
#3 0x00007f128f6a81c4 n/a (libc.so.6 + 0x891c4)
#4 0x00007f128f72885c n/a (libc.so.6 + 0x10985c)
Stack trace of thread 12472:
#0 0x00007f128f771ae9 n/a (libc.so.6 + 0x152ae9)
#1 0x00007f128f5c68af ggml_graph_compute_thread.isra.0 (libggml-cpu.so + 0x5a8af)
#2 0x00007f128f540d8e n/a (libgomp.so.1 + 0x1cd8e)
#3 0x00007f128f6a81c4 n/a (libc.so.6 + 0x891c4)
#4 0x00007f128f72885c n/a (libc.so.6 + 0x10985c)
Stack trace of thread 12470:
#0 0x00007f128f771b00 n/a (libc.so.6 + 0x152b00)
#1 0x00007f128f5c68af ggml_graph_compute_thread.isra.0 (libggml-cpu.so + 0x5a8af)
#2 0x00007f128f540d8e n/a (libgomp.so.1 + 0x1cd8e)
#3 0x00007f128f6a81c4 n/a (libc.so.6 + 0x891c4)
#4 0x00007f128f72885c n/a (libc.so.6 + 0x10985c)
Stack trace of thread 12471:
#0 0x00007f128f771af3 n/a (libc.so.6 + 0x152af3)
#1 0x00007f128f5c68af ggml_graph_compute_thread.isra.0 (libggml-cpu.so + 0x5a8af)
#2 0x00007f128f540d8e n/a (libgomp.so.1 + 0x1cd8e)
#3 0x00007f128f6a81c4 n/a (libc.so.6 + 0x891c4)
#4 0x00007f128f72885c n/a (libc.so.6 + 0x10985c)
Stack trace of thread 12473:
#0 0x00007f128f771ae4 n/a (libc.so.6 + 0x152ae4)
#1 0x00007f128f5c68af ggml_graph_compute_thread.isra.0 (libggml-cpu.so + 0x5a8af)
#2 0x00007f128f540d8e n/a (libgomp.so.1 + 0x1cd8e)
#3 0x00007f128f6a81c4 n/a (libc.so.6 + 0x891c4)
#4 0x00007f128f72885c n/a (libc.so.6 + 0x10985c)
Stack trace of thread 12466:
#0 0x00007f128f6a4f16 n/a (libc.so.6 + 0x85f16)
#1 0x00007f128f6a78bc pthread_cond_timedwait (libc.so.6 + 0x888bc)
#2 0x00007f127ebcac8a n/a (libcuda.so.1 + 0x1cac8a)
#3 0x00007f127ec6dee3 n/a (libcuda.so.1 + 0x26dee3)
#4 0x00007f128f6a81c4 n/a (libc.so.6 + 0x891c4)
#5 0x00007f128f72885c n/a (libc.so.6 + 0x10985c)
Stack trace of thread 12457:
#0 0x00007f128f771ae9 n/a (libc.so.6 + 0x152ae9)
#1 0x00007f128f5c68af ggml_graph_compute_thread.isra.0 (libggml-cpu.so + 0x5a8af)
#2 0x00007f128f5380b6 GOMP_parallel (libgomp.so.1 + 0x140b6)
#3 0x00007f128f597a5c ggml_graph_compute (libggml-cpu.so + 0x2ba5c)
#4 0x00007f128f5a61c2 _ZL30ggml_backend_cpu_graph_computeP12ggml_backendP11ggml_cgraph (libggml-cpu.so + 0x3a1c2)
#5 0x00007f128fb67f83 ggml_backend_sched_graph_compute_async (libggml-base.so + 0x26f83)
#6 0x00007f128fc694b0 _ZL19llama_graph_computeR13llama_contextP11ggml_cgraphiP15ggml_threadpool (libllama.so + 0x4e4b0)
#7 0x00007f128fc6d8f3 llama_kv_cache_update (libllama.so + 0x528f3)
#8 0x00007f128fc6e87e _ZL17llama_decode_implR13llama_context11llama_batch (libllama.so + 0x5387e)
#9 0x00007f128fc6fa87 llama_decode (libllama.so + 0x54a87)
#10 0x000055df853f5596 main (llama-cli + 0x22596)
#11 0x00007f128f64624a n/a (libc.so.6 + 0x2724a)
#12 0x00007f128f646305 __libc_start_main (libc.so.6 + 0x27305)
#13 0x000055df853f95d1 _start (llama-cli + 0x265d1)
Stack trace of thread 12467:
#0 0x00007f128f6a4f16 n/a (libc.so.6 + 0x85f16)
#1 0x00007f128f6a75d8 pthread_cond_wait (libc.so.6 + 0x885d8)
#2 0x000055df854b957b _ZZN10common_log6resumeEvENKUlvE_clEv (llama-cli + 0xe657b)
#3 0x00007f128f8d44a3 n/a (libstdc++.so.6 + 0xd44a3)
#4 0x00007f128f6a81c4 n/a (libc.so.6 + 0x891c4)
#5 0x00007f128f72885c n/a (libc.so.6 + 0x10985c)
Stack trace of thread 12469:
#0 0x00007f128f6a4f16 n/a (libc.so.6 + 0x85f16)
#1 0x00007f128f6a78bc pthread_cond_timedwait (libc.so.6 + 0x888bc)
#2 0x00007f127ebcac8a n/a (libcuda.so.1 + 0x1cac8a)
#3 0x00007f127ec6dee3 n/a (libcuda.so.1 + 0x26dee3)
#4 0x00007f128f6a81c4 n/a (libc.so.6 + 0x891c4)
#5 0x00007f128f72885c n/a (libc.so.6 + 0x10985c)
Stack trace of thread 12465:
#0 0x00007f128f71b1df __poll (libc.so.6 + 0xfc1df)
#1 0x00007f127ec761ef n/a (libcuda.so.1 + 0x2761ef)
#2 0x00007f127ed3a67f n/a (libcuda.so.1 + 0x33a67f)
#3 0x00007f127ec6dee3 n/a (libcuda.so.1 + 0x26dee3)
#4 0x00007f128f6a81c4 n/a (libc.so.6 + 0x891c4)
#5 0x00007f128f72885c n/a (libc.so.6 + 0x10985c)
Stack trace of thread 12458:
#0 0x00007f128f71b1df __poll (libc.so.6 + 0xfc1df)
#1 0x00007f127ec761ef n/a (libcuda.so.1 + 0x2761ef)
#2 0x00007f127ed3a67f n/a (libcuda.so.1 + 0x33a67f)
#3 0x00007f127ec6dee3 n/a (libcuda.so.1 + 0x26dee3)
#4 0x00007f128f6a81c4 n/a (libc.so.6 + 0x891c4)
#5 0x00007f128f72885c n/a (libc.so.6 + 0x10985c)
Stack trace of thread 12468:
#0 0x00007f128f71b1df __poll (libc.so.6 + 0xfc1df)
#1 0x00007f127ec761ef n/a (libcuda.so.1 + 0x2761ef)
#2 0x00007f127ed3a67f n/a (libcuda.so.1 + 0x33a67f)
#3 0x00007f127ec6dee3 n/a (libcuda.so.1 + 0x26dee3)
#4 0x00007f128f6a81c4 n/a (libc.so.6 + 0x891c4)
#5 0x00007f128f72885c n/a (libc.so.6 + 0x10985c)
ELF object binary architecture: AMD x86-64
First Bad Commit
No response