Decoder accuracy: close the gap to SC-004 ≥85% target (currently ~53%)

## Context

Phase 4 PR-6 ([T057](https://github.com/marquetools/marque/blob/main/specs/004-constraints-decoder-vocab/tasks.md) decoder accuracy harness) landed the SC-004 measurement gate as `crates/engine/tests/decoder_accuracy.rs::resolution_rate_at_0_85`. The test is `#[ignore]`-marked because the decoder's current empirical accuracy is well below the spec's 85% target. The complementary `resolution_rate_does_not_regress` is always-on at a 50% floor and prevents the accuracy from getting worse.

This issue tracks closing the gap so the `#[ignore]` can be removed and SC-004 lands as a load-bearing gate.

## Current state (2026-04-25 capture, branch `004-phase4-pr6-bench-accuracy-gates`)

```
Per-class breakdown:
  GarbledDelimiter: 51/51  (100.0%)  ✅
  MissingDelimiter:  0/17  (  0.0%)  ❌
  Reordering:       41/41  (100.0%)  ✅
  SupersededToken:   2/3   ( 66.7%)  ⚠️
  Typo:             26/130 ( 20.0%)  ❌
  WrongCase:        18/18  (100.0%)  ✅
  Aggregate:       138/260 ( 53.1%)
```

To reach 85% aggregate (221/260), the decoder needs to recover roughly:
- **+83 fixtures** if the gain comes purely from Typo (104/130 → 80% Typo-class accuracy)
- **+17 fixtures** if MissingDelimiter is fully recovered (17/17 → 100%) plus +66 from Typo
- Or a mix across both classes — the per-class table above shows where the headroom is.

## Specific gaps surfaced by the harness

The first five unresolved samples from the Typo class (representative, not exhaustive — full list reproducible by running the gate):

1. `"TOP SECRET//SI/UK//NOFORN"` → expected `"TOP SECRET//SI/TK//NOFORN"`. Decoder returned Unambiguous(TopSecret) but did NOT correct the SCI sub-compartment typo (`UK` → `TK`). The fuzzy matcher's per-token pass appears not to cover SCI sub-compartment positions.
2. `"SECRET//USAR-..."` → expected `"SECRET//SAR-..."`. Decoder produced 3 SCI controls instead of recognizing the multi-word SAR program identifier (`USAR-` typo prefix).
3. `"TPP SECRET//SI//NOFORN"` → expected `"TOP SECRET//SI//NOFORN"`. Decoder lost the classification entirely (`cls=None`); `TPP` did not fuzzy-match `TOP`.
4. `"SECRET//SAR-BP-J1 2J54-..."` → expected `"SECRET//SAR-BP-J12 J54-..."`. Intra-SAR-token typo (whitespace shift inside a multi-word SAR program identifier).
5. `"SECRET//SAR-...//NOFORON"` → expected `"...//NOFORN"`. Returned **zero-candidate**. `NOFORON` is edit-distance-1 from `NOFORN` (insertion) but the fuzzy matcher rejected it — likely the per-token MIN_FUZZY_LEN gate or insertion handling.

## Likely fix areas (decoder)

Based on the failure patterns above, the candidate work breakdown:

- [ ] **Edit-distance-1 insertions for short tokens** — `NOFORON` zero-candidate is the cleanest case. Probably a MIN_FUZZY_LEN edge or an insertion-handling gap in `marque-core::fuzzy::FuzzyVocabMatcher`.
- [ ] **SCI sub-compartment fuzzy correction** — extend the fuzzy pass beyond the bare control system into compartments / sub-compartments per CAPCO-2016 §A.6.
- [ ] **SAR program-identifier fuzzy correction** — multi-word SAR identifiers (CAPCO-2016 §H.5) need their own correction pass; the current per-token matcher splits them.
- [ ] **MissingDelimiter class** — 0/17 means the canonicalizer doesn't reconstruct missing `//` separators in canonical positions. Probably a missing transform in `generate_candidate_bytes`.
- [ ] **Classification-token typos beyond edit-distance-1** — `TPP SECRET` is edit-distance-1 from `TOP SECRET`, but the matcher returned `cls=None`. Worth confirming whether TPP is being normalized differently from `T0P`-style typos.

## Acceptance

- `cargo test -p marque-engine --test decoder_accuracy --features decoder-harness -- --ignored` exits 0.
- `#[ignore]` is removed from `resolution_rate_at_0_85`.
- The regression-floor constant `AGGREGATE_FLOOR_REGRESSION` in the same file is ratcheted up alongside the decoder improvements, so a future regression below the new measured rate also fails CI.

## Constitution / spec references

- Spec SC-004 (`specs/004-constraints-decoder-vocab/spec.md` line 149): "Of a mangled-marking fixture of at least 200 labeled cases, at least 85 percent are resolved to the expected canonical marking when the probabilistic recognizer's aggregate confidence threshold is set to 0.85 or higher."
- Constitution Principle VIII (Authoritative Source Fidelity): any new fuzzy-correction transform that touches CAPCO syntax must cite the relevant §A–H passage in `crates/capco/docs/CAPCO-2016.md`.

## Out of scope

- Lowering SC-004 below 85% — the spec target stands.
- Removing fixtures from `tests/fixtures/mangled/` to inflate the rate — the SC-004 floor of ≥200 cases is enforced by the harness's `MIN_FIXTURE_COUNT` constant.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Decoder accuracy: close the gap to SC-004 ≥85% target (currently ~53%) #133

Context

Current state (2026-04-25 capture, branch `004-phase4-pr6-bench-accuracy-gates`)

Specific gaps surfaced by the harness

Likely fix areas (decoder)

Acceptance

Constitution / spec references

Out of scope

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Decoder accuracy: close the gap to SC-004 ≥85% target (currently ~53%) #133

Description

Context

Current state (2026-04-25 capture, branch 004-phase4-pr6-bench-accuracy-gates)

Specific gaps surfaced by the harness

Likely fix areas (decoder)

Acceptance

Constitution / spec references

Out of scope

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Current state (2026-04-25 capture, branch `004-phase4-pr6-bench-accuracy-gates`)