Skip to content

Decoder accuracy: close the gap to SC-004 ≥85% target (currently ~53%) #133

@bashandbone

Description

@bashandbone

Context

Phase 4 PR-6 (T057 decoder accuracy harness) landed the SC-004 measurement gate as crates/engine/tests/decoder_accuracy.rs::resolution_rate_at_0_85. The test is #[ignore]-marked because the decoder's current empirical accuracy is well below the spec's 85% target. The complementary resolution_rate_does_not_regress is always-on at a 50% floor and prevents the accuracy from getting worse.

This issue tracks closing the gap so the #[ignore] can be removed and SC-004 lands as a load-bearing gate.

Current state (2026-04-25 capture, branch 004-phase4-pr6-bench-accuracy-gates)

Per-class breakdown:
  GarbledDelimiter: 51/51  (100.0%)  ✅
  MissingDelimiter:  0/17  (  0.0%)  ❌
  Reordering:       41/41  (100.0%)  ✅
  SupersededToken:   2/3   ( 66.7%)  ⚠️
  Typo:             26/130 ( 20.0%)  ❌
  WrongCase:        18/18  (100.0%)  ✅
  Aggregate:       138/260 ( 53.1%)

To reach 85% aggregate (221/260), the decoder needs to recover roughly:

  • +83 fixtures if the gain comes purely from Typo (104/130 → 80% Typo-class accuracy)
  • +17 fixtures if MissingDelimiter is fully recovered (17/17 → 100%) plus +66 from Typo
  • Or a mix across both classes — the per-class table above shows where the headroom is.

Specific gaps surfaced by the harness

The first five unresolved samples from the Typo class (representative, not exhaustive — full list reproducible by running the gate):

  1. "TOP SECRET//SI/UK//NOFORN" → expected "TOP SECRET//SI/TK//NOFORN". Decoder returned Unambiguous(TopSecret) but did NOT correct the SCI sub-compartment typo (UKTK). The fuzzy matcher's per-token pass appears not to cover SCI sub-compartment positions.
  2. "SECRET//USAR-..." → expected "SECRET//SAR-...". Decoder produced 3 SCI controls instead of recognizing the multi-word SAR program identifier (USAR- typo prefix).
  3. "TPP SECRET//SI//NOFORN" → expected "TOP SECRET//SI//NOFORN". Decoder lost the classification entirely (cls=None); TPP did not fuzzy-match TOP.
  4. "SECRET//SAR-BP-J1 2J54-..." → expected "SECRET//SAR-BP-J12 J54-...". Intra-SAR-token typo (whitespace shift inside a multi-word SAR program identifier).
  5. "SECRET//SAR-...//NOFORON" → expected "...//NOFORN". Returned zero-candidate. NOFORON is edit-distance-1 from NOFORN (insertion) but the fuzzy matcher rejected it — likely the per-token MIN_FUZZY_LEN gate or insertion handling.

Likely fix areas (decoder)

Based on the failure patterns above, the candidate work breakdown:

  • Edit-distance-1 insertions for short tokensNOFORON zero-candidate is the cleanest case. Probably a MIN_FUZZY_LEN edge or an insertion-handling gap in marque-core::fuzzy::FuzzyVocabMatcher.
  • SCI sub-compartment fuzzy correction — extend the fuzzy pass beyond the bare control system into compartments / sub-compartments per CAPCO-2016 §A.6.
  • SAR program-identifier fuzzy correction — multi-word SAR identifiers (CAPCO-2016 §H.5) need their own correction pass; the current per-token matcher splits them.
  • MissingDelimiter class — 0/17 means the canonicalizer doesn't reconstruct missing // separators in canonical positions. Probably a missing transform in generate_candidate_bytes.
  • Classification-token typos beyond edit-distance-1TPP SECRET is edit-distance-1 from TOP SECRET, but the matcher returned cls=None. Worth confirming whether TPP is being normalized differently from T0P-style typos.

Acceptance

  • cargo test -p marque-engine --test decoder_accuracy --features decoder-harness -- --ignored exits 0.
  • #[ignore] is removed from resolution_rate_at_0_85.
  • The regression-floor constant AGGREGATE_FLOOR_REGRESSION in the same file is ratcheted up alongside the decoder improvements, so a future regression below the new measured rate also fails CI.

Constitution / spec references

  • Spec SC-004 (specs/004-constraints-decoder-vocab/spec.md line 149): "Of a mangled-marking fixture of at least 200 labeled cases, at least 85 percent are resolved to the expected canonical marking when the probabilistic recognizer's aggregate confidence threshold is set to 0.85 or higher."
  • Constitution Principle VIII (Authoritative Source Fidelity): any new fuzzy-correction transform that touches CAPCO syntax must cite the relevant §A–H passage in crates/capco/docs/CAPCO-2016.md.

Out of scope

  • Lowering SC-004 below 85% — the spec target stands.
  • Removing fixtures from tests/fixtures/mangled/ to inflate the rate — the SC-004 floor of ≥200 cases is enforced by the harness's MIN_FIXTURE_COUNT constant.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions