-
Notifications
You must be signed in to change notification settings - Fork 9
Open
Description
Summary
Converting GGUF to APR format does not embed the tokenizer, causing inference to fail or produce incorrect output.
Expected Behavior
APR files converted from GGUF should include an embedded tokenizer, allowing self-contained inference.
Actual Behavior
- Conversion completes successfully without error
- APR file is created but missing tokenizer
- Inference produces error:
[PMAT-172] ERROR: APR file missing embedded tokenizer. - Even without error, inference produces completely different output than source GGUF
Reproduction Steps
MODEL=qwen2.5-coder-1.5b-instruct-q4_k_m.gguf
# Convert
apr rosetta convert $MODEL test.apr
# Run inference on GGUF - correct output
apr run $MODEL -p "What is 2+2? Answer with just the number:" --max-tokens 8 --no-gpu
# Output: 4
# Run inference on APR - wrong output
apr run test.apr -p "What is 2+2? Answer with just the number:" --max-tokens 8 --no-gpu
# Error: [PMAT-172] ERROR: APR file missing embedded tokenizer.
# Output: 1. What is the difference between aImpact
- P0 BLOCKER: Format conversion testing cannot pass
- All 6 conversion gates failing (F-CONV-G-A, F-CONV-A-G, F-CONV-G-S, F-CONV-S-G, F-CONV-A-S, F-CONV-S-A)
- Round-trip verification failing (F-CONV-RT-001)
- Model qualification blocked
Five Whys Root Cause Analysis
-
Why does APR inference produce wrong output?
- Tokenizer is missing from APR file
-
Why is the tokenizer missing from APR file?
- GGUF → APR conversion doesn't extract/embed the tokenizer
-
Why doesn't conversion extract the tokenizer?
- GGUF stores tokenizer data in metadata fields, conversion only copies tensor data
-
Why does conversion only copy tensor data?
- Original design focused on weight format conversion, not full model packaging
-
Why wasn't tokenizer embedding required originally?
- Early APR usage may have relied on external tokenizer.json files
Suggested Fix
In src/format/converter.rs, the GGUF → APR conversion should:
-
Extract tokenizer vocabulary from GGUF metadata:
tokenizer.ggml.tokens- token stringstokenizer.ggml.token_type- token typestokenizer.ggml.scores- token scores/mergestokenizer.ggml.bos_token_id/eos_token_idetc.
-
Embed tokenizer into APR format:
- Either as embedded JSON blob
- Or as native APR tokenizer section
-
Validate tokenizer presence in output APR file
Verification Test
# After fix, this should produce identical output:
apr rosetta convert model.gguf model.apr
apr run model.gguf -p "2+2=" --max-tokens 8 > gguf_out.txt
apr run model.apr -p "2+2=" --max-tokens 8 > apr_out.txt
diff gguf_out.txt apr_out.txt # Should be emptyRelated Issues
- REGRESSION: Format conversion still produces large diffs after #177 fix #181 (Q4_K_M block alignment) - FIXED
- SafeTensors inference fails: tokenizer.json and config.json not found after GGUF conversion #182 (SafeTensors companion files) - FIXED (different issue - external files vs embedded)
Environment
- apr-cli version: 0.2.12
- OS: Linux 6.8.0-90-generic
- Model: Qwen2.5-Coder-1.5B-Instruct Q4_K_M
Metadata
Metadata
Assignees
Labels
No labels