Skip to content

Eval bug: using reasoning budget in combination with backend sampling is not supported despite backend-sampling not being enabled #21208

@Galunid

Description

@Galunid

Name and Version

version: 8559 (59d8402)
built with GNU 15.2.1 for Linux x86_64

Operating systems

Linux

GGML backends

CUDA

Hardware

RTX2060 (laptop version)

Models

Qwen3.5 4B (same happens with other models in this family)

Problem description & steps to reproduce

Despite not running backend sampling, I hit the following assert (rbudget)

{
id = llama_get_sampled_token_ith(ctx, idx);
if (id != LLAMA_TOKEN_NULL) {
LOG_DBG("%s: Backend sampler selected token: '%d'. Will not run any CPU samplers\n", __func__, id);
GGML_ASSERT(!gsmpl->grmr && "using grammar in combination with backend sampling is not supported");
GGML_ASSERT(!gsmpl->rbudget && "using reasoning budget in combination with backend sampling is not supported");

Run ./llama-server --models-preset models.ini --port 9090 --host 0.0.0.0 with the following models.ini file:

models.ini
version = 1

[*]
no-mmap = true
# note the below may not be desirable if subagents are used
np = 1
# ctk = q8_0
# ctv = q8_0

[Qwen3.5-4B:Instruct-General]
model = /mnt/disk/llms/Qwen3.5-4B-UD-Q4_K_XL.gguf
c = 64000
temp = 0.7
top-p = 0.8
top-k = 20
min-p = 0.0
presence-penalty = 1.5
repeat-penalty = 1.0
n-predict = 32768
reasoning = off

First Bad Commit

59d8402

Relevant log output

Logs
[47365] llama.cpp/common/sampling.cpp:542: GGML_ASSERT(!gsmpl->rbudget && "using reasoning budget in combination with backend sampling is not supported") failed
[47365] No symbol table is loaded.  Use the "file" command.
[47365] Make breakpoint pending on future shared library load? (y or [n]) [answered N; input not from terminal]
[47365] [Thread debugging using libthread_db enabled]
[47365] Using host libthread_db library "/usr/lib/libthread_db.so.1".
[47365] 0x00007f8dbe2adf32 in ?? () from /usr/lib/libc.so.6
[47365] #0  0x00007f8dbe2adf32 in ?? () from /usr/lib/libc.so.6
[47365] #1  0x00007f8dbe2a239c in ?? () from /usr/lib/libc.so.6
[47365] #2  0x00007f8dbe2a23e4 in ?? () from /usr/lib/libc.so.6
[47365] #3  0x00007f8dbe31267f in wait4 () from /usr/lib/libc.so.6
[47365] #4  0x00007f8dc69bc3ab in ggml_print_backtrace () from /home/kris/llama.cpp/build/bin/libggml-base.so.0
[47365] #5  0x00007f8dc69bc510 in ggml_abort () from /home/kris/llama.cpp/build/bin/libggml-base.so.0
[47365] #6  0x0000564ee8b9cd86 in common_sampler_sample(common_sampler*, llama_context*, int, bool) ()
[47365] #7  0x0000564ee89ff938 in server_context_impl::update_slots() ()
[47365] #8  0x0000564ee8a8e45f in server_queue::start_loop(long) ()
[47365] #9  0x0000564ee894d3e2 in main ()
[47365] [Inferior 1 (process 98221) detached]
Additional server startup logs (no --backend-sampling)
srv          load: spawning server instance with name=Qwen3.5-4B:Instruct-General on port 47365
srv          load: spawning server instance with args:
srv          load:   llama.cpp/build/bin/llama-server
srv          load:   --host
srv          load:   127.0.0.1
srv          load:   --min-p
srv          load:   0.0
srv          load:   --no-mmap
srv          load:   --port
srv          load:   47365
srv          load:   --presence-penalty
srv          load:   1.5
srv          load:   --repeat-penalty
srv          load:   1.0
srv          load:   --temperature
srv          load:   0.7
srv          load:   --top-k
srv          load:   20
srv          load:   --top-p
srv          load:   0.8
srv          load:   --alias
srv          load:   Qwen3.5-4B:Instruct-General
srv          load:   --ctx-size
srv          load:   64000
srv          load:   --model
srv          load:   /mnt/disk/llms/Qwen3.5-4B-UD-Q4_K_XL.gguf
srv          load:   --n-predict
srv          load:   32768
srv          load:   --parallel
srv          load:   1
srv          load:   --reasoning
srv          load:   off

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions