Eval bug: using reasoning budget in combination with backend sampling is not supported despite backend-sampling not being enabled

### Name and Version

version: 8559 (59d840209)
built with GNU 15.2.1 for Linux x86_64


### Operating systems

Linux

### GGML backends

CUDA

### Hardware

RTX2060 (laptop version)

### Models

Qwen3.5 4B (same happens with other models in this family)

### Problem description & steps to reproduce

Despite not running backend sampling, I hit the following assert (rbudget) https://github.com/ggml-org/llama.cpp/blob/08f21453aec846867b39878500d725a05bd32683/common/sampling.cpp#L535-L542

Run `./llama-server --models-preset models.ini --port 9090 --host 0.0.0.0` with the following models.ini file:

<details>
<summary>models.ini</summary>

```ini
version = 1

[*]
no-mmap = true
# note the below may not be desirable if subagents are used
np = 1
# ctk = q8_0
# ctv = q8_0

[Qwen3.5-4B:Instruct-General]
model = /mnt/disk/llms/Qwen3.5-4B-UD-Q4_K_XL.gguf
c = 64000
temp = 0.7
top-p = 0.8
top-k = 20
min-p = 0.0
presence-penalty = 1.5
repeat-penalty = 1.0
n-predict = 32768
reasoning = off
```

</details>

### First Bad Commit

59d840209a5195c2f6e2e81b5f8339a0637b59d9

### Relevant log output

<details>
<summary>Logs</summary>


```console
[47365] llama.cpp/common/sampling.cpp:542: GGML_ASSERT(!gsmpl->rbudget && "using reasoning budget in combination with backend sampling is not supported") failed
[47365] No symbol table is loaded.  Use the "file" command.
[47365] Make breakpoint pending on future shared library load? (y or [n]) [answered N; input not from terminal]
[47365] [Thread debugging using libthread_db enabled]
[47365] Using host libthread_db library "/usr/lib/libthread_db.so.1".
[47365] 0x00007f8dbe2adf32 in ?? () from /usr/lib/libc.so.6
[47365] #0  0x00007f8dbe2adf32 in ?? () from /usr/lib/libc.so.6
[47365] #1  0x00007f8dbe2a239c in ?? () from /usr/lib/libc.so.6
[47365] #2  0x00007f8dbe2a23e4 in ?? () from /usr/lib/libc.so.6
[47365] #3  0x00007f8dbe31267f in wait4 () from /usr/lib/libc.so.6
[47365] #4  0x00007f8dc69bc3ab in ggml_print_backtrace () from /home/kris/llama.cpp/build/bin/libggml-base.so.0
[47365] #5  0x00007f8dc69bc510 in ggml_abort () from /home/kris/llama.cpp/build/bin/libggml-base.so.0
[47365] #6  0x0000564ee8b9cd86 in common_sampler_sample(common_sampler*, llama_context*, int, bool) ()
[47365] #7  0x0000564ee89ff938 in server_context_impl::update_slots() ()
[47365] #8  0x0000564ee8a8e45f in server_queue::start_loop(long) ()
[47365] #9  0x0000564ee894d3e2 in main ()
[47365] [Inferior 1 (process 98221) detached]

```
</details>

<details>
<summary>Additional server startup logs (no <code>--backend-sampling</code>)</summary>

```console
srv          load: spawning server instance with name=Qwen3.5-4B:Instruct-General on port 47365
srv          load: spawning server instance with args:
srv          load:   llama.cpp/build/bin/llama-server
srv          load:   --host
srv          load:   127.0.0.1
srv          load:   --min-p
srv          load:   0.0
srv          load:   --no-mmap
srv          load:   --port
srv          load:   47365
srv          load:   --presence-penalty
srv          load:   1.5
srv          load:   --repeat-penalty
srv          load:   1.0
srv          load:   --temperature
srv          load:   0.7
srv          load:   --top-k
srv          load:   20
srv          load:   --top-p
srv          load:   0.8
srv          load:   --alias
srv          load:   Qwen3.5-4B:Instruct-General
srv          load:   --ctx-size
srv          load:   64000
srv          load:   --model
srv          load:   /mnt/disk/llms/Qwen3.5-4B-UD-Q4_K_XL.gguf
srv          load:   --n-predict
srv          load:   32768
srv          load:   --parallel
srv          load:   1
srv          load:   --reasoning
srv          load:   off
```

	{
	id = llama_get_sampled_token_ith(ctx, idx);

	if (id != LLAMA_TOKEN_NULL) {
	LOG_DBG("%s: Backend sampler selected token: '%d'. Will not run any CPU samplers\n", __func__, id);

	GGML_ASSERT(!gsmpl->grmr && "using grammar in combination with backend sampling is not supported");
	GGML_ASSERT(!gsmpl->rbudget && "using reasoning budget in combination with backend sampling is not supported");

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Eval bug: using reasoning budget in combination with backend sampling is not supported despite backend-sampling not being enabled #21208

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Eval bug: using reasoning budget in combination with backend sampling is not supported despite backend-sampling not being enabled #21208

Description

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions