Skip to content

Gemma 4 outputs empty strings for Unicode characters on Windows #21423

@Offset0x

Description

@Offset0x

Name and Version

b8660 (working), b8661 (broken)

Operating systems

Windows

GGML backends

CUDA

Hardware

NVIDIA GeForce RTX 5060 Ti 16GB

Models

https://huggingface.co/unsloth/gemma-4-26B-A4B-it-GGUF/blob/main/gemma-4-26B-A4B-it-UD-IQ4_XS.gguf

Problem description & steps to reproduce

After updating from b8660 to b8661, Gemma 4 started returning empty strings '' for Unicode characters (specifically chess piece symbols and likely other multi-byte UTF-8 characters) on Windows. Rolling back to b8660 fixes it immediately.Same GGUF, same command, same hardware.

The root cause is visible in the startup logs: b8661 loads the model with vocab type = BPE and n_merges = 514906, while b8660 loads the exact same GGUF with vocab type = SPM and n_merges = 0. Something in PR #21406 (or its interaction with #21343) changed the tokenizer routing on Windows, switching from SPM to BPE mode. The BPE path appears to silently drop multi-byte UTF-8 sequences on output on Windows.

Steps to reproduce:

Example Prompt:

const PIECES = {
    wP: '', wR: '♖', wN: '♘', wB: '♗', wQ: '♕', wK: '♔',
    bP: '♟', bR: '', bN: '', bB: '♝', bQ: '♛', bK: '♚'
};

fill the empty pieces

b8661

const PIECES = {
    wP: '', wR: '♖', wN: '♘', wB: '♗', wQ: '♕', wK: '♔',
    bP: '♟', bR: '', bN: '', bB: '♝', bQ: '♛', bK: '♚'
};

b8660

const PIECES = {
    wP: '♙', wR: '♖', wN: '♘', wB: '♗', wQ: '♕', wK: '♔',
    bP: '♟', bR: '♜', bN: '♞', bB: '♝', bQ: '♛', bK: '♚'
};

First Bad Commit

PR #21406 (b8661) — "llama: add custom newline split for Gemma 4"

Relevant log output

b8661 (broken, Windows):

print_info: vocab type  = BPE
print_info: n_merges    = 514906
print_info: LF token    = 107 '<actual newline>'

b8660 (working, Windows and Linux):

print_info: vocab type  = SPM
print_info: n_merges    = 0
print_info: LF token    = 248 '<0x0A>'

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions