Skip to content

SYCL: flash attention tile kernel crash on 2nd prompt with Qwen3.5 #21396

@mina-ai-io

Description

@mina-ai-io

Environment

Reproduction

  1. Start server with Qwen3.5
  2. Send first prompt → OK
  3. Send second prompt → Crash

Crash Log

set_adapters_lora: adapters = (nil)
adapters_lora_are_same: adapters = (nil)
set_embeddings: value = 0
~/llama.cpp/ggml/src/ggml-sycl/template-instances/../fattn-tile.hpp:1255: fatal error
[New LWP 157786]
[New LWP 157785]
[New LWP 157784]
[New LWP 157783]
[New LWP 157782]
[New LWP 157781]
[New LWP 157780]
[New LWP 157779]
[New LWP 157778]
[New LWP 157777]
[New LWP 157776]
[New LWP 157775]
[New LWP 157774]
[New LWP 157773]
[New LWP 157772]
[New LWP 157771]
[New LWP 157770]
[New LWP 157769]
[New LWP 157768]
[New LWP 157767]
[New LWP 157766]
[New LWP 157765]
[New LWP 157764]
[New LWP 157763]
[New LWP 157762]
[New LWP 157760]

This GDB supports auto-downloading debuginfo from the following URLs:
https://debuginfod.ubuntu.com
Enable debuginfod for this session? (y or [n]) [answered N; input not from terminal]
Debuginfod has been disabled.
To make this setting permanent, add 'set debuginfod enabled off' to .gdbinit.
warning: File "/opt/intel/oneapi/compiler/2025.3/lib/libsycl.so.8.0.0-gdb.py" auto-loading has been declined by your `auto-load safe-path' set to "$debugdir:$datadir/auto-load".
To enable execution of this file add
add-auto-load-safe-path /opt/intel/oneapi/compiler/2025.3/lib/libsycl.so.8.0.0-gdb.py
line to your configuration file "/.config/gdb/gdbinit".
To completely disable this security protection add
set auto-load safe-path /
line to your configuration file "
/.config/gdb/gdbinit".
For more information about this security protection see the
"Auto-loading safe path" section in the GDB manual. E.g., run from the shell:
info "(gdb)Auto-loading safe path"
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
0x000072fcb7110813 in __GI___wait4 (pid=158145, stat_loc=0x0, options=0, usage=0x0) at ../sysdeps/unix/sysv/linux/wait4.c:30
warning: 30 ../sysdeps/unix/sysv/linux/wait4.c: No such file or directory
#0 0x000072fcb7110813 in __GI___wait4 (pid=158145, stat_loc=0x0, options=0, usage=0x0) at ../sysdeps/unix/sysv/linux/wait4.c:30
30 in ../sysdeps/unix/sysv/linux/wait4.c
#1 0x000072fcba23467a in ggml_print_backtrace () from ~/llama.cpp/build/bin/libggml-base.so.0
#2 0x000072fcba2337f9 in ggml_abort () from ~/llama.cpp/build/bin/libggml-base.so.0
#3 0x000072fcb7b0825e in void launch_fattn_tile_switch_ncols1<256, 256, 8, false>(ggml_backend_sycl_context&, ggml_tensor*) () from ~/llama.cpp/build/bin/libggml-sycl.so.0
#4 0x000072fcb7913ea1 in ggml_backend_sycl_graph_compute_impl(ggml_backend_sycl_context*, ggml_cgraph*) () from ~/llama.cpp/build/bin/libggml-sycl.so.0
#5 0x000072fcb7912bb7 in ggml_backend_sycl_graph_compute(ggml_backend*, ggml_cgraph*) () from ~/llama.cpp/build/bin/libggml-sycl.so.0
#6 0x000072fcba259609 in ggml_backend_sched_graph_compute_async () from ~/llama.cpp/build/bin/libggml-base.so.0
#7 0x000072fcb9e9a541 in llama_context::graph_compute(ggml_cgraph*, bool) () from ~/llama.cpp/build/bin/libllama.so.0
#8 0x000072fcb9e9a03f in llama_context::process_ubatch(llama_ubatch const&, llm_graph_type, llama_memory_context_i*, ggml_status&) () from ~/llama.cpp/build/bin/libllama.so.0
#9 0x000072fcb9e9bcee in llama_context::decode(llama_batch const&) () from ~/llama.cpp/build/bin/libllama.so.0
#10 0x000072fcb9ea058b in llama_decode () from ~/llama.cpp/build/bin/libllama.so.0
#11 0x00000000004db979 in server_context_impl::update_slots() ()
#12 0x0000000000577507 in server_queue::start_loop(long) ()
#13 0x0000000000433efd in main ()
[Inferior 1 (process 157737) detached]

Metadata

Metadata

Assignees

Labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions