Skip to content

Eval bug: Granite Speech on CUDA - ssm-conv.cu:146 assertion #23015

@gabe-l-hart

Description

@gabe-l-hart

Name and Version

$ llama-cli --version
ggml_cuda_init: found 1 CUDA devices (Total VRAM: 122502 MiB):
  Device 0: NVIDIA GB10, compute capability 12.1, VMM: yes, VRAM: 122502 MiB
version: 1 (ef93e98)
built with GNU 13.3.0 for Linux aarch64

Operating systems

Linux

GGML backends

CUDA

Hardware

GB10

$ nvidia-smi
Wed May 13 09:41:48 2026       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.142                Driver Version: 580.142        CUDA Version: 13.0     |
+-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GB10                    On  |   0000000F:01:00.0 Off |                  N/A |
| N/A   44C    P0             11W /  N/A  | Not Supported          |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A            2859      G   /usr/lib/xorg/Xorg                       18MiB |
|    0   N/A  N/A            2993      G   /usr/bin/gnome-shell                      6MiB |
|    0   N/A  N/A            3782      C   llama-server                            170MiB |
+-----------------------------------------------------------------------------------------+

Models

Problem description & steps to reproduce

The ssm-scan.cu kernel currently doesn't support the shape of the scan tensors needed for the QFormer projector used in the mmproj for these models:

# Run with full mmproj offloading -> Core dump
llama-cli -hf ibm-granite/granite-speech-4.1-2b-GGUF:Q8_0 --audio audio/multilingual_sample.wav -p "can you transcribe the speech into a written format?"

# Run without mmproj offloading -> good output
llama-cli -hf ibm-granite/granite-speech-4.1-2b-GGUF:Q8_0 --audio audio/multilingual_sample.wav -p "can you transcribe the speech into a written format?" --no-mmproj-offload

First Bad Commit

No response

Relevant log output

Logs
|/tmp/llama.cpp.UInjSQ/ggml/src/ggml-cuda/ssm-conv.cu:147: Only support kernel sizes 3, 4, 5, 9 right now.
...
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/aarch64-linux-gnu/libthread_db.so.1".
0x0000f4e327461e9c in __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0xfffff423b998, op=137, expected=0, futex_word=0xc6fd84217408) at ./nptl/futex-internal.c:57
warning: 57	./nptl/futex-internal.c: No such file or directory
#0  0x0000f4e327461e9c in __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0xfffff423b998, op=137, expected=0, futex_word=0xc6fd84217408) at ./nptl/futex-internal.c:57
57	in ./nptl/futex-internal.c
#1  __futex_abstimed_wait_common (cancel=true, private=0, abstime=0xfffff423b998, clockid=-2078182392, expected=0, futex_word=0xc6fd84217408) at ./nptl/futex-internal.c:87
87	in ./nptl/futex-internal.c
#2  __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0xc6fd84217408, expected=expected@entry=0, clockid=clockid@entry=1, abstime=abstime@entry=0xfffff423b998, private=private@entry=0) at ./nptl/futex-internal.c:139
139	in ./nptl/futex-internal.c
#3  0x0000f4e327465140 in __pthread_cond_wait_common (abstime=0xfffff423b998, clockid=1, mutex=0xc6fd842173b0, cond=0xc6fd842173e0) at ./nptl/pthread_cond_wait.c:503
warning: 503	./nptl/pthread_cond_wait.c: No such file or directory
#4  ___pthread_cond_clockwait64 (abstime=0xfffff423b998, clockid=1, mutex=0xc6fd842173b0, cond=0xc6fd842173e0) at ./nptl/pthread_cond_wait.c:691
691	in ./nptl/pthread_cond_wait.c
#5  ___pthread_cond_clockwait64 (cond=0xc6fd842173e0, mutex=0xc6fd842173b0, clockid=1, abstime=0xfffff423b998) at ./nptl/pthread_cond_wait.c:679
679	in ./nptl/pthread_cond_wait.c
#6  0x0000c6fd5f538f6c in server_response::recv_with_timeout(std::unordered_set<int, std::hash<int>, std::equal_to<int>, std::allocator<int> > const&, int) ()
#7  0x0000c6fd5f53c7e8 in server_response_reader::next(std::function<bool ()> const&) ()
#8  0x0000c6fd5f4ea680 in cli_context::generate_completion[abi:cxx11](result_timings&) ()
#9  0x0000c6fd5f4d4d50 in main ()
[Inferior 1 (process 6406) detached]
-Aborted (core dumped)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions