APR Rosetta: Differential tracing to catch layout bugs (GH-186 class)

## Problem

APR Q4K inference produces garbage (PAD tokens) while GGUF Q4K works. This bug class has occurred ~50 times. Current tracing doesn't catch it quickly.

## Five-Whys Root Cause

1. **Why garbage?** Weight matrices have wrong layout (GGML [in,out] vs expected [out,in])
2. **Why not caught?** Tracing logs code paths but not data correctness
3. **Why no validation?** We log dimensions but don't assert layout conventions
4. **Why no assertions?** APR loader treats tensors uniformly without layout metadata
5. **Why no metadata?** **ROOT CAUSE:** No differential trace comparing APR vs GGUF values

## Solution: APR Rosetta Mode

Add `apr rosetta` command that:

1. **Loads both formats** (APR + GGUF) for same model
2. **Runs inference in parallel** with identical inputs
3. **Compares intermediate values** at each layer:
   - Embedding output
   - Per-layer: attention input, QKV, attention output, FFN input/output
   - Final logits
4. **Detects divergence** with threshold (default 1e-3)
5. **Diagnoses layout issues** by comparing tensor dimensions

### CLI Interface

```bash
# Basic comparison
apr rosetta model.apr model.gguf --prompt "2+2="

# With assertion (fails CI if divergence)
apr rosetta model.apr model.gguf --prompt "2+2=" --assert-match --threshold 1e-3

# Verbose mode (shows all intermediate values)
apr rosetta model.apr model.gguf --prompt "2+2=" -v
```

### Expected Output

```
[ROSETTA] Loading APR: model.apr
[ROSETTA] Loading GGUF: model.gguf
[ROSETTA] Prompt: "2+2="
[ROSETTA] Token IDs: [17, 10, 17, 28]

[ROSETTA] === Embedding ===
[ROSETTA] APR dims: [4, 1536], GGUF dims: [4, 1536] ✓
[ROSETTA] max_diff=0.0001, mean_diff=0.00003 ✓

[ROSETTA] === Layer 0 ===
[ROSETTA] QKV APR dims: [1536, 4608], GGUF dims: [4608, 1536]
[ROSETTA] ⚠️  LAYOUT MISMATCH: APR=[in,out] GGUF=[out,in]
[ROSETTA] QKV max_diff=847.3 ✗ DIVERGENCE!
[ROSETTA] DIAGNOSIS: APR QKV weight needs transpose (GGML convention)

[ROSETTA] === Summary ===
[ROSETTA] FAILED: Layout mismatch detected at Layer 0 QKV
[ROSETTA] Recommendation: Transpose weight matrices during APR load
```

## Enhanced Default Logging

Additionally, APR load should ALWAYS log (not just in debug mode):

```rust
// ALWAYS log, catches bugs early
eprintln!("[APR-LOAD] Tensor '{}': dims={:?}, dtype={:?}", name, dims, dtype);
eprintln!("[APR-LOAD] Expected layout: [out_dim, in_dim] for matmul");
eprintln!("[APR-LOAD] Actual layout: {:?} - {}", dims, 
    if needs_transpose { "TRANSPOSE NEEDED" } else { "OK" });
```

## Acceptance Criteria

- [ ] `apr rosetta` command compares APR vs GGUF intermediate values
- [ ] Detects layout mismatches (dimension order differences)
- [ ] Provides actionable diagnosis ("needs transpose")
- [ ] `--assert-match` flag for CI integration
- [ ] Enhanced APR load logging enabled by default
- [ ] This bug class catchable in <10 seconds instead of hours

## References

- GH-186: APR Q4K inference garbage bug
- CLAUDE.md Section 12.1: Format-Aware Differential Tracing (APR-TRACE-002)
- docs/specifications/qwen2.5-coder-showcase-demo.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

APR Rosetta: Differential tracing to catch layout bugs (GH-186 class) #188

Problem

Five-Whys Root Cause

Solution: APR Rosetta Mode

CLI Interface

Expected Output

Enhanced Default Logging

Acceptance Criteria

References

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

APR Rosetta: Differential tracing to catch layout bugs (GH-186 class) #188

Description

Problem

Five-Whys Root Cause

Solution: APR Rosetta Mode

CLI Interface

Expected Output

Enhanced Default Logging

Acceptance Criteria

References

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions