-
Notifications
You must be signed in to change notification settings - Fork 9
Open
Labels
bugSomething isn't workingSomething isn't workingenhancementNew feature or requestNew feature or request
Description
Problem
APR Q4_K inference produces garbage (PAD tokens 151935) while GGUF Q4_K produces correct output. This class of bug has occurred 50+ times and is extremely difficult to debug without proper tracing.
Root Cause Found: Embedding tensor stored as [hidden_dim, vocab_size] (GGML convention) but embed() expects [vocab_size, hidden_dim] layout. The transposition mismatch causes token lookups to read wrong data.
Current State
--traceonly shows timing data, not tensor values- No comparison between formats (GGUF vs APR)
- No automatic detection of layout mismatches
- No embedding sanity checks
Requirements
1. Enhanced Default Logging (P0)
When running apr rosetta conversions or apr run with APR files:
- Log tensor shapes and verify they match expected model config
- Log first 5 values of embedding tensor after load
- Detect and warn on [hidden_dim, vocab_size] vs [vocab_size, hidden_dim] mismatch
2. Format-Aware Differential Tracing (P1)
New --trace-diff flag for apr run:
apr run model.gguf model.apr "2+2?" --trace-diff- Compare token-by-token output between two model formats
- Show first divergence point
- Classify bug type (WEIGHT_LOAD_FAILURE, EMBEDDING_FAILURE, etc.)
3. Embedding Sanity Check (P0)
Add validation in APR loader:
// Verify embedding layout matches expected [vocab_size, hidden_dim]
let expected_size = vocab_size * hidden_dim;
if token_embedding.len() != expected_size {
warn!("Embedding size mismatch: got {}, expected {}", ...);
}
// Check first token produces non-zero, non-garbage values
let test_embed = embed(&[0]);
if test_embed.iter().all(|&x| x == 0.0) {
error!("Embedding produces all zeros - likely transposition bug");
}Acceptance Criteria
- APR loader logs embedding shape on load (always, not just debug mode)
- APR loader detects and warns on embedding transposition
-
apr run --traceshows tensor value samples, not just timing - Bug classification enum exists for common failure modes
- Regression test for embedding transposition detection
Related
- PMAT-199: APR Q4_K inference garbage
- BUG: NaN values in GGUF→APR→GGUF roundtrip (Jidoka PMAT-187) #186: APR Q4_K produces PAD tokens
- Spec: Section 12.1 Format-Aware Differential Tracing (APR-TRACE-002)
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workingenhancementNew feature or requestNew feature or request