-
Notifications
You must be signed in to change notification settings - Fork 9
Open
Labels
bugSomething isn't workingSomething isn't workingenhancementNew feature or requestNew feature or request
Description
Problem
APR Q4K inference produces garbage (PAD tokens) while GGUF Q4K works. This bug class has occurred ~50 times. Current tracing doesn't catch it quickly.
Five-Whys Root Cause
- Why garbage? Weight matrices have wrong layout (GGML [in,out] vs expected [out,in])
- Why not caught? Tracing logs code paths but not data correctness
- Why no validation? We log dimensions but don't assert layout conventions
- Why no assertions? APR loader treats tensors uniformly without layout metadata
- Why no metadata? ROOT CAUSE: No differential trace comparing APR vs GGUF values
Solution: APR Rosetta Mode
Add apr rosetta command that:
- Loads both formats (APR + GGUF) for same model
- Runs inference in parallel with identical inputs
- Compares intermediate values at each layer:
- Embedding output
- Per-layer: attention input, QKV, attention output, FFN input/output
- Final logits
- Detects divergence with threshold (default 1e-3)
- Diagnoses layout issues by comparing tensor dimensions
CLI Interface
# Basic comparison
apr rosetta model.apr model.gguf --prompt "2+2="
# With assertion (fails CI if divergence)
apr rosetta model.apr model.gguf --prompt "2+2=" --assert-match --threshold 1e-3
# Verbose mode (shows all intermediate values)
apr rosetta model.apr model.gguf --prompt "2+2=" -vExpected Output
[ROSETTA] Loading APR: model.apr
[ROSETTA] Loading GGUF: model.gguf
[ROSETTA] Prompt: "2+2="
[ROSETTA] Token IDs: [17, 10, 17, 28]
[ROSETTA] === Embedding ===
[ROSETTA] APR dims: [4, 1536], GGUF dims: [4, 1536] ✓
[ROSETTA] max_diff=0.0001, mean_diff=0.00003 ✓
[ROSETTA] === Layer 0 ===
[ROSETTA] QKV APR dims: [1536, 4608], GGUF dims: [4608, 1536]
[ROSETTA] ⚠️ LAYOUT MISMATCH: APR=[in,out] GGUF=[out,in]
[ROSETTA] QKV max_diff=847.3 ✗ DIVERGENCE!
[ROSETTA] DIAGNOSIS: APR QKV weight needs transpose (GGML convention)
[ROSETTA] === Summary ===
[ROSETTA] FAILED: Layout mismatch detected at Layer 0 QKV
[ROSETTA] Recommendation: Transpose weight matrices during APR load
Enhanced Default Logging
Additionally, APR load should ALWAYS log (not just in debug mode):
// ALWAYS log, catches bugs early
eprintln!("[APR-LOAD] Tensor '{}': dims={:?}, dtype={:?}", name, dims, dtype);
eprintln!("[APR-LOAD] Expected layout: [out_dim, in_dim] for matmul");
eprintln!("[APR-LOAD] Actual layout: {:?} - {}", dims,
if needs_transpose { "TRANSPOSE NEEDED" } else { "OK" });Acceptance Criteria
-
apr rosettacommand compares APR vs GGUF intermediate values - Detects layout mismatches (dimension order differences)
- Provides actionable diagnosis ("needs transpose")
-
--assert-matchflag for CI integration - Enhanced APR load logging enabled by default
- This bug class catchable in <10 seconds instead of hours
References
- BUG: NaN values in GGUF→APR→GGUF roundtrip (Jidoka PMAT-187) #186: APR Q4K inference garbage bug
- CLAUDE.md Section 12.1: Format-Aware Differential Tracing (APR-TRACE-002)
- docs/specifications/qwen2.5-coder-showcase-demo.md
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workingenhancementNew feature or requestNew feature or request