Skip to content

BUG: GGUF → APR conversion missing embedded tokenizer (PMAT-172) #185

@noahgift

Description

@noahgift

Summary

Converting GGUF to APR format does not embed the tokenizer, causing inference to fail or produce incorrect output.

Expected Behavior

APR files converted from GGUF should include an embedded tokenizer, allowing self-contained inference.

Actual Behavior

  1. Conversion completes successfully without error
  2. APR file is created but missing tokenizer
  3. Inference produces error: [PMAT-172] ERROR: APR file missing embedded tokenizer.
  4. Even without error, inference produces completely different output than source GGUF

Reproduction Steps

MODEL=qwen2.5-coder-1.5b-instruct-q4_k_m.gguf

# Convert
apr rosetta convert $MODEL test.apr

# Run inference on GGUF - correct output
apr run $MODEL -p "What is 2+2? Answer with just the number:" --max-tokens 8 --no-gpu
# Output: 4

# Run inference on APR - wrong output
apr run test.apr -p "What is 2+2? Answer with just the number:" --max-tokens 8 --no-gpu
# Error: [PMAT-172] ERROR: APR file missing embedded tokenizer.
# Output: 1. What is the difference between a

Impact

  • P0 BLOCKER: Format conversion testing cannot pass
  • All 6 conversion gates failing (F-CONV-G-A, F-CONV-A-G, F-CONV-G-S, F-CONV-S-G, F-CONV-A-S, F-CONV-S-A)
  • Round-trip verification failing (F-CONV-RT-001)
  • Model qualification blocked

Five Whys Root Cause Analysis

  1. Why does APR inference produce wrong output?

    • Tokenizer is missing from APR file
  2. Why is the tokenizer missing from APR file?

    • GGUF → APR conversion doesn't extract/embed the tokenizer
  3. Why doesn't conversion extract the tokenizer?

    • GGUF stores tokenizer data in metadata fields, conversion only copies tensor data
  4. Why does conversion only copy tensor data?

    • Original design focused on weight format conversion, not full model packaging
  5. Why wasn't tokenizer embedding required originally?

    • Early APR usage may have relied on external tokenizer.json files

Suggested Fix

In src/format/converter.rs, the GGUF → APR conversion should:

  1. Extract tokenizer vocabulary from GGUF metadata:

    • tokenizer.ggml.tokens - token strings
    • tokenizer.ggml.token_type - token types
    • tokenizer.ggml.scores - token scores/merges
    • tokenizer.ggml.bos_token_id / eos_token_id etc.
  2. Embed tokenizer into APR format:

    • Either as embedded JSON blob
    • Or as native APR tokenizer section
  3. Validate tokenizer presence in output APR file

Verification Test

# After fix, this should produce identical output:
apr rosetta convert model.gguf model.apr
apr run model.gguf -p "2+2=" --max-tokens 8 > gguf_out.txt
apr run model.apr -p "2+2=" --max-tokens 8 > apr_out.txt
diff gguf_out.txt apr_out.txt  # Should be empty

Related Issues

Environment

  • apr-cli version: 0.2.12
  • OS: Linux 6.8.0-90-generic
  • Model: Qwen2.5-Coder-1.5B-Instruct Q4_K_M

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions