Eval bug: gemma-4-26B-A4B crashing (openweb-ui -> litellm -> llama.cpp version: 8661 (b7ad48ebd)

### Name and Version

llama-server --version
version: 8661 (b7ad48ebd)

### Operating systems

Linux

### GGML backends

CUDA

### Hardware

  Device 0: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes, VRAM: 24124 MiB
  Device 1: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes, VRAM: 24124 MiB
  Device 2: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes, VRAM: 24124 MiB
  Device 3: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes, VRAM: 24124 MiB
  Device 4: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes, VRAM: 24124 MiB
  Device 5: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes, VRAM: 24124 MiB

### Models

ggml-org/gemma-4-26B-A4B-it-GGUF/gemma-4-26B-A4B-it-Q4_K_M.gguf

### Problem description & steps to reproduce

I run:

script to run llama.cpp: gemma-4-26B-A4B-it-q4_k_m.sh
#!/bin/bash
export PATH=/usr/local/cuda-12.8/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda-12.8/lib64:$LD_LIBRARY_PATH
export CUDA_VISIBLE_DEVICES=2

LLAMA_SERVER_BIN="/storage/llm/llama.cpp/build/bin/llama-server"
MODEL_PATH="/storage/llm/models/gemma-4-26B-A4B-it-GGUF/gemma-4-26B-A4B-it-Q4_K_M.gguf"
MMPROJ_PATH="/storage/llm/models/gemma-4-26B-A4B-it-GGUF/mmproj-gemma-4-26B-A4B-it-f16.gguf"

exec "$LLAMA_SERVER_BIN" \
    -m "$MODEL_PATH" \
    --mmproj "$MMPROJ_PATH" \
    --alias gemma-4-26b \
    --host 0.0.0.0 \
    --port 8001 \
    -np 1 \
    -ngl 99 \
    -fa on \
    -c 32768 \
    -ctk q8_0 \
    -ctv q8_0 \
    -b 2048 \
    --no-mmap \
    --no-warmup

It is added in litellm - output from gui

{
  "input_cost_per_token": 0,
  "output_cost_per_token": 0,
  "api_base": "http://127.0.0.1:8001/v1",
  "custom_llm_provider": "openai",
  "use_in_pass_through": false,
  "use_litellm_proxy": false,
  "merge_reasoning_content_in_choices": false,
  "tags": [],
  "model": "gemma-4-26b",
  "guardrails": [],
  "vector_store_ids": []
}

model used trough litellm in Open WebUI ‧ v0.8.12

### First Bad Commit

_No response_

### Relevant log output

<details>
<summary>Logs</summary>
</details>

[llama.cpp_gemma-4-26B-A4B-it-q4_k_m.txt](https://github.com/user-attachments/files/26479729/llama.cpp_gemma-4-26B-A4B-it-q4_k_m.txt)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Eval bug: gemma-4-26B-A4B crashing (openweb-ui -> litellm -> llama.cpp version: 8661 (b7ad48ebd) #21420

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Eval bug: gemma-4-26B-A4B crashing (openweb-ui -> litellm -> llama.cpp version: 8661 (b7ad48ebd) #21420

Description

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions