Misc. bug: --no-warmup failing in llama-server.exe for some vision models

### Name and Version

load_backend: loaded CUDA backend from C:\Users\metal\OneDrive\Desktop\mar20\src\models\llamacpp_gpu\ggml-cuda.dll
load_backend: loaded RPC backend from C:\Users\metal\OneDrive\Desktop\mar20\src\models\llamacpp_gpu\ggml-rpc.dll
load_backend: loaded CPU backend from C:\Users\metal\OneDrive\Desktop\mar20\src\models\llamacpp_gpu\ggml-cpu-alderlake.dll
version: 7222 (746f9ee88)
built with clang version 19.1.5 for x86_64-pc-windows-msvc

### Operating systems

Windows

### Which llama.cpp modules do you know to be affected?

llama-server

### Command line

```shell
llama-server.exe -m qwen3vl2b.gguf --mmproj q41_mmproj.gguf --no-warmup -c 4000
```

### Problem description & steps to reproduce

Trying out manual warmup with #17652 but it fails on the first request (my warmup call) with error `ggml_new_object: not enough space in the context's memory pool (needed 330192, available 16)`

(can't check with `llama-mtmd-cli.exe` as `--no-warmup` is an invalid argument there)

Not sure if this is an actual issue or if I just misunderstood the feature. If I understand correctly, I need to:
1. Start up server.
2. Make a dummy "warmup" chat call with the desired image warmup size.

Reproducing it:

1. Start up server with no-warmup. I used Qwen3-VL/LFM2-VL (`llama-server.exe  --model "qwen3vl2b.gguf" --mmproj "q41_mmproj.gguf" -c 4000 --no-warmup`)
2. Use webui or make OpenAI API compliant chat call to it with image

Dummy image:

<img width="512" height="512" alt="Image" src="https://github.com/user-attachments/assets/352266c3-b9e2-48a3-879f-00409c2f21e5" />


Removing `--no-warmup` makes it work normally. But it'd be nice to use `--no-warmup` so that I can specify my own "ideal" image warmup size for reduced reserved memory. (Tested on CPU and CUDA backends)

### First Bad Commit

#17652 

### Relevant log output

```shell
slot get_availabl: id  3 | task -1 | selected slot by LRU, t_last = -1
slot launch_slot_: id  3 | task -1 | sampler chain: logits -> logit-bias -> penalties -> dry -> top-n-sigma -> top-k -> typical -> top-p -> min-p -> xtc -> temp-ext -> dist
slot launch_slot_: id  3 | task 0 | processing task
slot update_slots: id  3 | task 0 | new prompt, n_ctx_slot = 20224, n_keep = 0, task.n_tokens = 266
slot update_slots: id  3 | task 0 | n_tokens = 0, memory_seq_rm [0, end)
slot update_slots: id  3 | task 0 | prompt processing progress, n_tokens = 4, batch.n_tokens = 4, progress = 0.015038
slot update_slots: id  3 | task 0 | n_tokens = 4, memory_seq_rm [4, end)
srv  process_chun: processing image...
encoding image slice...
ggml_new_object: not enough space in the context's memory pool (needed 330192, available 16)
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Misc. bug: --no-warmup failing in llama-server.exe for some vision models #17676

Name and Version

Operating systems

Which llama.cpp modules do you know to be affected?

Command line

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Misc. bug: --no-warmup failing in llama-server.exe for some vision models #17676

Description

Name and Version

Operating systems

Which llama.cpp modules do you know to be affected?

Command line

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions