Skip to content

Eval bug: Gemma 4 generates <unused> tokens in infinite loop #21516

@CSCSoftware

Description

@CSCSoftware

Bug Description

Gemma 4 models generate an infinite stream of <unused> tokens (Token ID 14 = <unused8>) on the Vulkan backend, both with GPU offloading and CPU-only. No valid text is produced — the model runs until MaxTokens is exhausted.

This happens despite having all known Gemma 4 fixes applied:

Environment

Steps to Reproduce

  1. Build llama.cpp from current master with Vulkan enabled
  2. Load gemma-4-E2B-it-Q4_K_M.gguf with -ngl 99
  3. Send any prompt (e.g., "Hello")
  4. Observe: model generates ~18000+ tokens of <unused8> (token id=14) without producing any readable text or hitting EOG

Diagnostic Data

Token sampling output (first 10 tokens):

Token[0] id=14
Token[1] id=14
Token[2] id=14
Token[3] id=14
Token[4] id=14
Token[5] id=14
Token[6] id=14
Token[7] id=14
Token[8] id=14
Token[9] id=14

Token 14 in Gemma 4 vocab = <unused8>.

Generation stats: 183 tok/s, 18432 tokens generated, 44.7 seconds — no EOG token emitted.

Additional Testing

  • CPU-only (-ngl 0): Same result — generates [multimodal] tokens (id=5) in an infinite loop. Also broken.
  • PR models : set gemma 4 FFN MoE prec to F32 #21506 applied (F32 MoE FFN precision): No improvement on either CPU or Vulkan.
  • Ollama: Same model works correctly in Ollama (which uses its own llama.cpp fork), producing valid responses.

Init Logs (successful)

Model handle: OK
Vocab size: 262144
Layers: 35
Context size: 32768
Gemma tokens: start_of_turn=105, end_of_turn=106, bos=2
System tokens: 11
Initialization complete!

Model loads correctly, context/sampler/batch all initialized — the issue is purely in inference/sampling.

Related Issues

The root cause may be Vulkan-specific numerical precision issues beyond what #21506 addresses, or a different code path in the Vulkan compute shaders.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions