Name and Version
8133
Vulkan pre-build binary distributed for Linux x86.
Operating systems
Linux
GGML backends
Vulkan
Hardware
Strix Halo
Models
https://huggingface.co/ggml-org/gpt-oss-120b-GGUF
Problem description & steps to reproduce
./llama-b8133_vk/llama-server -m gpt-oss-120b-mxfp4-00001-of-00003.gguf -c 1310720 --host 0.0.0.0 --parallel 10 -kvu
3 request with context length 10000 => OK
4 request with context length 10000 => crash
/home/runner/work/llama.cpp/llama.cpp/ggml/src/ggml-backend.cpp:306: GGML_ASSERT(tensor->data != NULL && "tensor not allocated") failed
(request sent by llama-benchy)
First Bad Commit
No response
Relevant log output
srv params_from_: Chat format: GPT-OSS
srv params_from_: Chat format: GPT-OSS
srv params_from_: Chat format: GPT-OSS
slot get_availabl: id 1 | task -1 | selected slot by LRU, t_last = -1
slot launch_slot_: id 1 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id 1 | task 428 | processing task, is_child = 0
slot get_availabl: id 0 | task -1 | selected slot by LRU, t_last = -1
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id 0 | task 429 | processing task, is_child = 0
slot get_availabl: id 9 | task -1 | selected slot by LRU, t_last = 19337946462
srv get_availabl: updating prompt cache
srv prompt_save: - saving prompt with length 95, total state size = 6.683 MiB
/home/runner/work/llama.cpp/llama.cpp/ggml/src/ggml-backend.cpp:306: GGML_ASSERT(tensor->data != NULL && "tensor not allocated") failed
/home/user/chat/llama-b8133_vk/libggml-base.so.0(+0x1848b) [0x7f00bca5d48b]
/home/user/chat/llama-b8133_vk/libggml-base.so.0(ggml_print_backtrace+0x21f) [0x7f00bca5d8ef]
/home/user/chat/llama-b8133_vk/libggml-base.so.0(ggml_abort+0x152) [0x7f00bca5dac2]
/home/user/chat/llama-b8133_vk/libggml-base.so.0(ggml_backend_tensor_get+0x109) [0x7f00bca74d59]
/home/user/chat/llama-b8133_vk/libllama.so.0(_ZN21llama_io_write_buffer12write_tensorEPK11ggml_tensormm+0x31) [0x7f00bc0c7dd1]
/home/user/chat/llama-b8133_vk/libllama.so.0(_ZNK14llama_kv_cache16state_write_dataER16llama_io_write_iRKNS_13cell_ranges_tE+0x156) [0x7f00bc0fdae6]
/home/user/chat/llama-b8133_vk/libllama.so.0(_ZNK14llama_kv_cache11state_writeER16llama_io_write_iij+0x295) [0x7f00bc0fdfe5]
/home/user/chat/llama-b8133_vk/libllama.so.0(_ZNK19llama_kv_cache_iswa11state_writeER16llama_io_write_iij+0x2c) [0x7f00bc10fe6c]
/home/user/chat/llama-b8133_vk/libllama.so.0(_ZN13llama_context20state_seq_write_dataER16llama_io_write_iij+0x1a) [0x7f00bc0ba62a]
/home/user/chat/llama-b8133_vk/libllama.so.0(_ZN13llama_context18state_seq_get_dataEiPhmj+0x4d) [0x7f00bc0ba6fd]
./llama-b8133_vk/llama-server(+0x130d70) [0x5606a2f48d70]
./llama-b8133_vk/llama-server(+0x13eb0b) [0x5606a2f56b0b]
./llama-b8133_vk/llama-server(+0x1819ee) [0x5606a2f999ee]
./llama-b8133_vk/llama-server(+0xa177e) [0x5606a2eb977e]
/usr/lib/x86_64-linux-gnu/libc.so.6(+0x29f75) [0x7f00bba33f75]
/usr/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x87) [0x7f00bba34027]
./llama-b8133_vk/llama-server(+0xa52d5) [0x5606a2ebd2d5]
Name and Version
8133
Vulkan pre-build binary distributed for Linux x86.
Operating systems
Linux
GGML backends
Vulkan
Hardware
Strix Halo
Models
https://huggingface.co/ggml-org/gpt-oss-120b-GGUF
Problem description & steps to reproduce
./llama-b8133_vk/llama-server -m gpt-oss-120b-mxfp4-00001-of-00003.gguf -c 1310720 --host 0.0.0.0 --parallel 10 -kvu
3 request with context length 10000 => OK
4 request with context length 10000 => crash
/home/runner/work/llama.cpp/llama.cpp/ggml/src/ggml-backend.cpp:306: GGML_ASSERT(tensor->data != NULL && "tensor not allocated") failed
(request sent by llama-benchy)
First Bad Commit
No response
Relevant log output
srv params_from_: Chat format: GPT-OSS
srv params_from_: Chat format: GPT-OSS
srv params_from_: Chat format: GPT-OSS
slot get_availabl: id 1 | task -1 | selected slot by LRU, t_last = -1
slot launch_slot_: id 1 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id 1 | task 428 | processing task, is_child = 0
slot get_availabl: id 0 | task -1 | selected slot by LRU, t_last = -1
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id 0 | task 429 | processing task, is_child = 0
slot get_availabl: id 9 | task -1 | selected slot by LRU, t_last = 19337946462
srv get_availabl: updating prompt cache
srv prompt_save: - saving prompt with length 95, total state size = 6.683 MiB
/home/runner/work/llama.cpp/llama.cpp/ggml/src/ggml-backend.cpp:306: GGML_ASSERT(tensor->data != NULL && "tensor not allocated") failed
/home/user/chat/llama-b8133_vk/libggml-base.so.0(+0x1848b) [0x7f00bca5d48b]
/home/user/chat/llama-b8133_vk/libggml-base.so.0(ggml_print_backtrace+0x21f) [0x7f00bca5d8ef]
/home/user/chat/llama-b8133_vk/libggml-base.so.0(ggml_abort+0x152) [0x7f00bca5dac2]
/home/user/chat/llama-b8133_vk/libggml-base.so.0(ggml_backend_tensor_get+0x109) [0x7f00bca74d59]
/home/user/chat/llama-b8133_vk/libllama.so.0(_ZN21llama_io_write_buffer12write_tensorEPK11ggml_tensormm+0x31) [0x7f00bc0c7dd1]
/home/user/chat/llama-b8133_vk/libllama.so.0(_ZNK14llama_kv_cache16state_write_dataER16llama_io_write_iRKNS_13cell_ranges_tE+0x156) [0x7f00bc0fdae6]
/home/user/chat/llama-b8133_vk/libllama.so.0(_ZNK14llama_kv_cache11state_writeER16llama_io_write_iij+0x295) [0x7f00bc0fdfe5]
/home/user/chat/llama-b8133_vk/libllama.so.0(_ZNK19llama_kv_cache_iswa11state_writeER16llama_io_write_iij+0x2c) [0x7f00bc10fe6c]
/home/user/chat/llama-b8133_vk/libllama.so.0(_ZN13llama_context20state_seq_write_dataER16llama_io_write_iij+0x1a) [0x7f00bc0ba62a]
/home/user/chat/llama-b8133_vk/libllama.so.0(_ZN13llama_context18state_seq_get_dataEiPhmj+0x4d) [0x7f00bc0ba6fd]
./llama-b8133_vk/llama-server(+0x130d70) [0x5606a2f48d70]
./llama-b8133_vk/llama-server(+0x13eb0b) [0x5606a2f56b0b]
./llama-b8133_vk/llama-server(+0x1819ee) [0x5606a2f999ee]
./llama-b8133_vk/llama-server(+0xa177e) [0x5606a2eb977e]
/usr/lib/x86_64-linux-gnu/libc.so.6(+0x29f75) [0x7f00bba33f75]
/usr/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x87) [0x7f00bba34027]
./llama-b8133_vk/llama-server(+0xa52d5) [0x5606a2ebd2d5]