Skip to content

APR Rosetta: Differential tracing to catch layout bugs (GH-186 class) #188

@noahgift

Description

@noahgift

Problem

APR Q4K inference produces garbage (PAD tokens) while GGUF Q4K works. This bug class has occurred ~50 times. Current tracing doesn't catch it quickly.

Five-Whys Root Cause

  1. Why garbage? Weight matrices have wrong layout (GGML [in,out] vs expected [out,in])
  2. Why not caught? Tracing logs code paths but not data correctness
  3. Why no validation? We log dimensions but don't assert layout conventions
  4. Why no assertions? APR loader treats tensors uniformly without layout metadata
  5. Why no metadata? ROOT CAUSE: No differential trace comparing APR vs GGUF values

Solution: APR Rosetta Mode

Add apr rosetta command that:

  1. Loads both formats (APR + GGUF) for same model
  2. Runs inference in parallel with identical inputs
  3. Compares intermediate values at each layer:
    • Embedding output
    • Per-layer: attention input, QKV, attention output, FFN input/output
    • Final logits
  4. Detects divergence with threshold (default 1e-3)
  5. Diagnoses layout issues by comparing tensor dimensions

CLI Interface

# Basic comparison
apr rosetta model.apr model.gguf --prompt "2+2="

# With assertion (fails CI if divergence)
apr rosetta model.apr model.gguf --prompt "2+2=" --assert-match --threshold 1e-3

# Verbose mode (shows all intermediate values)
apr rosetta model.apr model.gguf --prompt "2+2=" -v

Expected Output

[ROSETTA] Loading APR: model.apr
[ROSETTA] Loading GGUF: model.gguf
[ROSETTA] Prompt: "2+2="
[ROSETTA] Token IDs: [17, 10, 17, 28]

[ROSETTA] === Embedding ===
[ROSETTA] APR dims: [4, 1536], GGUF dims: [4, 1536] ✓
[ROSETTA] max_diff=0.0001, mean_diff=0.00003 ✓

[ROSETTA] === Layer 0 ===
[ROSETTA] QKV APR dims: [1536, 4608], GGUF dims: [4608, 1536]
[ROSETTA] ⚠️  LAYOUT MISMATCH: APR=[in,out] GGUF=[out,in]
[ROSETTA] QKV max_diff=847.3 ✗ DIVERGENCE!
[ROSETTA] DIAGNOSIS: APR QKV weight needs transpose (GGML convention)

[ROSETTA] === Summary ===
[ROSETTA] FAILED: Layout mismatch detected at Layer 0 QKV
[ROSETTA] Recommendation: Transpose weight matrices during APR load

Enhanced Default Logging

Additionally, APR load should ALWAYS log (not just in debug mode):

// ALWAYS log, catches bugs early
eprintln!("[APR-LOAD] Tensor '{}': dims={:?}, dtype={:?}", name, dims, dtype);
eprintln!("[APR-LOAD] Expected layout: [out_dim, in_dim] for matmul");
eprintln!("[APR-LOAD] Actual layout: {:?} - {}", dims, 
    if needs_transpose { "TRANSPOSE NEEDED" } else { "OK" });

Acceptance Criteria

  • apr rosetta command compares APR vs GGUF intermediate values
  • Detects layout mismatches (dimension order differences)
  • Provides actionable diagnosis ("needs transpose")
  • --assert-match flag for CI integration
  • Enhanced APR load logging enabled by default
  • This bug class catchable in <10 seconds instead of hours

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingenhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions