Skip to content

WebGPU: Qwen2 models produce garbled output (repeated @ token) #21602

@c0d3rman

Description

@c0d3rman

Description

Qwen2 models produce garbled output (repeated @ / token ID 31) when using the ggml WebGPU backend in the browser. Other architectures (TinyLlama/Llama) work correctly on the same setup.

Environment

  • Browser: Chrome 146, Dia (Chromium-based) — same result on both
  • GPU: Apple Metal-3 (M-series Mac)
  • WebGPU adapter: vendor: "apple", arch: "metal-3", features include shader-f16, subgroups
  • Wllama fork: reeselevine/wllama master branch (PR Add disk space requirements to README.md #201 to ngxson/wllama)
  • JSPI: Available and used

Models tested

Model GGUF Output
TinyLlama-1.1B-Chat Q4_K_M TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF ✅ Coherent
Qwen2.5-1.5B-Instruct Q4_K_M Qwen/Qwen2.5-1.5B-Instruct-GGUF ❌ Repeats @
Qwen2.5-1.5B-Instruct Q8_0 Custom fine-tune ❌ Repeats @
Qwen2.5-1.5B-Instruct Q4_K_M Custom fine-tune ❌ Repeats @

All Qwen2 models produce identical garbage. The same GGUF files work correctly on CPU (WASM-only wllama) and local mlx-lm inference.

Suspected cause

Qwen2-1.5B has dimensions that differ from Llama:

  • num_attention_heads: 12 (not a power of 2)
  • num_key_value_heads: 2 (GQA ratio 6:1)
  • hidden_size: 1536 (not a power of 2)
  • intermediate_size: 8960
  • rope_freq_base: 1000000

TinyLlama has num_attention_heads: 32, num_key_value_heads: 4 (GQA 8:1), hidden_size: 2048 — all power-of-2 dimensions. This suggests the WebGPU matmul or attention shaders may have an issue with non-power-of-2 head counts or hidden dimensions.

Steps to reproduce

  1. Build wllama with ggml WebGPU (reeselevine/wllama master branch)
  2. Load any Qwen2 GGUF model with preferWebGPU: true
  3. Generate text — output will be repeated @ characters

Expected behavior

Coherent text output matching CPU inference.

cc @reeselevine

Edit: my coding agent posted this during a debugging run without asking me 😬 Feel free to ignore if irrelevant

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions