BUG: GGUF → APR conversion missing embedded tokenizer (PMAT-172)

## Summary
Converting GGUF to APR format does not embed the tokenizer, causing inference to fail or produce incorrect output.

## Expected Behavior
APR files converted from GGUF should include an embedded tokenizer, allowing self-contained inference.

## Actual Behavior
1. Conversion completes successfully without error
2. APR file is created but missing tokenizer
3. Inference produces error: `[PMAT-172] ERROR: APR file missing embedded tokenizer.`
4. Even without error, inference produces completely different output than source GGUF

## Reproduction Steps
```bash
MODEL=qwen2.5-coder-1.5b-instruct-q4_k_m.gguf

# Convert
apr rosetta convert $MODEL test.apr

# Run inference on GGUF - correct output
apr run $MODEL -p "What is 2+2? Answer with just the number:" --max-tokens 8 --no-gpu
# Output: 4

# Run inference on APR - wrong output
apr run test.apr -p "What is 2+2? Answer with just the number:" --max-tokens 8 --no-gpu
# Error: [PMAT-172] ERROR: APR file missing embedded tokenizer.
# Output: 1. What is the difference between a
```

## Impact
- **P0 BLOCKER**: Format conversion testing cannot pass
- All 6 conversion gates failing (F-CONV-G-A, F-CONV-A-G, F-CONV-G-S, F-CONV-S-G, F-CONV-A-S, F-CONV-S-A)
- Round-trip verification failing (F-CONV-RT-001)
- Model qualification blocked

## Five Whys Root Cause Analysis

1. **Why** does APR inference produce wrong output?
   - Tokenizer is missing from APR file

2. **Why** is the tokenizer missing from APR file?
   - GGUF → APR conversion doesn't extract/embed the tokenizer

3. **Why** doesn't conversion extract the tokenizer?
   - GGUF stores tokenizer data in metadata fields, conversion only copies tensor data

4. **Why** does conversion only copy tensor data?
   - Original design focused on weight format conversion, not full model packaging

5. **Why** wasn't tokenizer embedding required originally?
   - Early APR usage may have relied on external tokenizer.json files

## Suggested Fix

In `src/format/converter.rs`, the GGUF → APR conversion should:

1. Extract tokenizer vocabulary from GGUF metadata:
   - `tokenizer.ggml.tokens` - token strings
   - `tokenizer.ggml.token_type` - token types
   - `tokenizer.ggml.scores` - token scores/merges
   - `tokenizer.ggml.bos_token_id` / `eos_token_id` etc.

2. Embed tokenizer into APR format:
   - Either as embedded JSON blob
   - Or as native APR tokenizer section

3. Validate tokenizer presence in output APR file

## Verification Test
```bash
# After fix, this should produce identical output:
apr rosetta convert model.gguf model.apr
apr run model.gguf -p "2+2=" --max-tokens 8 > gguf_out.txt
apr run model.apr -p "2+2=" --max-tokens 8 > apr_out.txt
diff gguf_out.txt apr_out.txt  # Should be empty
```

## Related Issues
- #181 (Q4_K_M block alignment) - FIXED
- #182 (SafeTensors companion files) - FIXED (different issue - external files vs embedded)

## Environment
- apr-cli version: 0.2.12
- OS: Linux 6.8.0-90-generic
- Model: Qwen2.5-Coder-1.5B-Instruct Q4_K_M

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: GGUF → APR conversion missing embedded tokenizer (PMAT-172) #185

Summary

Expected Behavior

Actual Behavior

Reproduction Steps

Impact

Five Whys Root Cause Analysis

Suggested Fix

Verification Test

Related Issues

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

BUG: GGUF → APR conversion missing embedded tokenizer (PMAT-172) #185

Description

Summary

Expected Behavior

Actual Behavior

Reproduction Steps

Impact

Five Whys Root Cause Analysis

Suggested Fix

Verification Test

Related Issues

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions