-
Notifications
You must be signed in to change notification settings - Fork 9
Description
REGRESSION: Format Conversion Still Failing After #177 Fix
Status: REGRESSION from closed #177
Severity: P0 (CRITICAL - Data Corruption)
Component: apr-rosetta / realizear
Discovered By: apr-model-qa-playbook requalification (2026-01-30)
Blocking: Model qualification certification
Executive Summary
Issue #177 was closed, but requalification testing on 2026-01-30 shows format conversion still fails with large output differences. The Jidoka detection is working (diffs are flagged), but the root cause fix is incomplete.
Regression Evidence
Test Environment
Date: 2026-01-30T14:59:00Z
Host: noah-Lambda-Vector
Model: Qwen/Qwen2.5-Coder-1.5B-Instruct (GGUF Q4_K_M)
Path: /home/noah/.cache/huggingface/hub/models--Qwen--Qwen2.5-Coder-1.5B-Instruct-GGUF/snapshots/.../qwen2.5-coder-1.5b-instruct-q4_k_m.gguf
Playbook: qwen2.5-coder-1.5b-ci.playbook.yaml
Test Results
Total scenarios: 57
Passed: 50
Failed: 7 ← ALL 7 ARE FORMAT CONVERSION
Pass rate: 89.3% ← Should be 100%
Detailed Failures
| Gate | Conversion | Diff | Tolerance | Verdict |
|---|---|---|---|---|
| F-CONV-001 | GGUF → APR | 6.77e-1 | 1.00e-6 | ❌ FAIL (677,000× over tolerance) |
| F-CONV-002 | APR → GGUF | 4.16e-1 | 1.00e-6 | ❌ FAIL (416,000× over tolerance) |
| F-CONV-003 | GGUF → SafeTensors | Infrastructure error | - | ❌ FAIL (see below) |
| F-CONV-004 | SafeTensors → GGUF | 4.16e-1 | 1.00e-6 | ❌ FAIL |
| F-CONV-005 | APR → SafeTensors | Infrastructure error | - | ❌ FAIL (see below) |
| F-CONV-006 | SafeTensors → APR | 6.77e-1 | 1.00e-6 | ❌ FAIL |
| F-CONV-RT-001 | Round-trip | Blocked | - | ❌ FAIL |
Raw Evidence from evidence.json
{
"gate_id": "F-CONV-G-A",
"outcome": "Falsified",
"reason": "Conversion Gguf → Apr produced different output (diff: 6.77e-1, ε: 1.00e-6)",
"output": "6de63189564fc936",
"timestamp": "2026-01-30T14:07:23.xxx"
}
{
"gate_id": "F-CONV-A-G",
"outcome": "Falsified",
"reason": "Conversion Apr → Gguf produced different output (diff: 4.16e-1, ε: 1.00e-6)",
"output": "0356a3e657672e25",
"timestamp": "2026-01-30T14:07:35.xxx"
}Comparison: Before vs After #177 Fix
| Metric | Before #177 | After #177 | Status |
|---|---|---|---|
| NaN detection | ❌ Silent | ✅ Detected | FIXED |
| Inf detection | ❌ Silent | ✅ Detected | FIXED |
| Output diff (GGUF→APR) | 8.46e-1 | 6.77e-1 | WORSE → BETTER (15% improvement) |
| Output diff (APR→GGUF) | 6.34e-1 | 4.16e-1 | WORSE → BETTER (34% improvement) |
| Within tolerance (ε=1e-6) | ❌ No | ❌ No | STILL FAILING |
| Round-trip lossless | ❌ No | ❌ No | STILL FAILING |
Conclusion: #177 fix improved detection and reduced diff magnitude, but diffs are still 400,000× to 700,000× above tolerance.
Root Cause Hypothesis
The #177 fix addressed:
- ✅ NaN/Inf detection (Jidoka working)
- ✅ Some quantization parameter handling
But did NOT address:
- ❌ Quantization scale/offset precision loss
- ❌ Block-wise quantization metadata transfer
- ❌ Q4_K_M super-block structure preservation
Technical Detail
Q4_K_M uses a two-level quantization structure:
Super-block (256 elements):
- Scale (fp16)
- Min (fp16)
- 32× Sub-blocks of 8 elements each
- Sub-scale (6-bit)
- 4-bit quantized weights
If the super-block scales are truncated or misaligned during conversion, all weights in that block will be off by a multiplicative factor, leading to the large cumulative diffs we observe.
Suggested Additional Fixes
1. Preserve Full Quantization Metadata
struct Q4KMSuperBlock {
d: f16, // Super-block scale - MUST preserve full precision
dmin: f16, // Super-block min - MUST preserve full precision
scales: [u8; 12], // Sub-block scales - MUST preserve bit-exact
qs: [u8; 128], // Quantized values
}
// During conversion, ensure:
// 1. d and dmin are NOT downcast to f32 then back to f16
// 2. scales array is copied bit-exact, not recomputed
// 3. Block alignment matches source format2. Add Tensor-Level Validation
fn validate_conversion(source: &Tensor, converted: &Tensor) -> Result<()> {
let diff = (source.to_f32() - converted.to_f32()).abs().max();
if diff > EPSILON {
return Err(ConversionError::LossyConversion {
diff,
tolerance: EPSILON,
tensor_name: source.name.clone(),
});
}
Ok(())
}3. Test Each Quantization Type Separately
# Test suite should cover:
apr rosetta convert model_q4_k_m.gguf test.apr && apr rosetta convert test.apr model_back.gguf
apr rosetta convert model_q5_k_m.gguf test.apr && apr rosetta convert test.apr model_back.gguf
apr rosetta convert model_q8_0.gguf test.apr && apr rosetta convert test.apr model_back.gguf
apr rosetta convert model_f16.gguf test.apr && apr rosetta convert test.apr model_back.gguf
# All should produce diff < 1e-6MQS Impact
| Metric | Current | Required |
|---|---|---|
| Score | 41.1/100 | 87+/100 |
| Grade | F | B or higher |
| Conversion gates | 0/7 | 7/7 |
| Lost points | ~45 | 0 |
Verification Criteria
Issue is resolved when:
cd ../apr-model-qa-playbook
cargo run --bin apr-qa -- run playbooks/models/qwen2.5-coder-1.5b-ci.playbook.yaml \
--subprocess --model-path <model.gguf> --no-gpu --output output/verify
# Required:
# - F-CONV-001 through F-CONV-006: ALL PASS (diff < 1e-6)
# - F-CONV-RT-001: PASS (round-trip lossless)
# - MQS Score: 87+/100
# - Pass rate: 100%References
- Original issue: P0 CRITICAL: Format conversion introduces NaN/Inf corruption in tensor weights #177 (CLOSED - but regression detected)
- Evidence file:
../apr-model-qa-playbook/output/qwen-requalify/evidence.json - MQS report:
../apr-model-qa-playbook/output/qwen-requalify/mqs.json - Verification playbook:
../apr-model-qa-playbook/playbooks/verify/TICKET-177.yaml - Spec: Section 4 (Format Conversion Testing), tolerance = 1e-6
Filed by: apr-model-qa-playbook requalification (automated)
Related: #177 (regression), #172 (original P0)