Skip to content

Phase 5 PR-2: Vocabulary<CapcoScheme> impl + FOUO regression guards#143

Merged
bashandbone merged 7 commits into
mainfrom
feat/phase5-pr2-vocabulary-impl
Apr 25, 2026
Merged

Phase 5 PR-2: Vocabulary<CapcoScheme> impl + FOUO regression guards#143
bashandbone merged 7 commits into
mainfrom
feat/phase5-pr2-vocabulary-impl

Conversation

@bashandbone
Copy link
Copy Markdown
Collaborator

Summary

  • Implements impl Vocabulary<CapcoScheme> for CapcoScheme (T084) — composes the per-CVE-file and per-token tables emitted in PR-1 (marque_ism::generated::vocabulary) into the marque_scheme::Vocabulary<S> trait surface (Authority, OwnerProducer, PointOfContact, Deprecation, portion/banner forms, TokenMetadataFull<TokenId>).
  • Adds FOUO regression guards (T075/T076) to keep the deprecated FOUO → CUI migration entry from being silently re-introduced and to verify FOUO remains an active dissem control end-to-end.
  • Adds zero-allocation regression test (T077) gated behind a new count-allocs feature on marque-capco, mirroring the existing harness in marque-core/tests/alloc_budget.rs.

Per Constitution VII, the trait impl lives in marque-capco (the convergence crate that depends on both marque-ism and marque-scheme), not in marque-ism.

Token coverage

CapcoScheme::Token = TokenId is opaque; the active set today is the ~14 hand-assigned sentinels in crates/capco/src/scheme.rs. This PR maps the 10 sentinels with a corresponding canonical CVE value (NF, RD, FRD, TFNI, RD-CNWDI, UCNI, HCS, R, ND, XD) into the generated tables.

Aggregate sentinels (TOK_US_CLASSIFIED, TOK_NON_US_CLASSIFICATION, TOK_IC_DISSEM, TOK_NON_IC_DISSEM), trigraph sentinels (TOK_USA — trigraphs are XSD-sourced, not in the JSON-derived TOKEN_METADATA), and grammar-shape sentinels (TOK_JOINT, TOK_FGI_MARKER) panic on lookup, surfacing misuse loudly. Phase C extends both the sentinel set and this mapping.

Static lifetimes

Every accessor returns &'static data via LazyLock<Vec<...>>-backed tables — initialized once on first call, dereferenced in place thereafter. The count-allocs test verifies post-warmup accessors do zero heap allocation.

FOUO regression guards (FR-020)

Test Location Asserts
fouo_is_not_in_migration_table crates/ism/tests/migrations.rs Deprecated FOUO → CUI entry stays out of MIGRATIONS; no entry's replacement is CUI.
fouo_remains_in_active_token_metadata crates/ism/tests/migrations.rs FOUO is still published under CVE_DISSEM in v2022-DEC.
fouo_remains_active_dissem_control crates/capco/tests/vocabulary.rs FOUO-bearing input through Engine::lint produces no diagnostic suggesting CUI migration.

Tasks landed

  • T071 every_active_token_has_authority
  • T072 authority_points_to_odni_for_ism_tokens
  • T073 deprecated_tokens_carry_deprecation
  • T074 deprecation_replacement_when_known
  • T075 fouo_is_not_in_migration_table (+ fouo_remains_in_active_token_metadata)
  • T076 fouo_remains_active_dissem_control
  • T077 metadata_query_is_zero_alloc (new count-allocs feature)
  • T084 impl Vocabulary<CapcoScheme> for CapcoScheme
  • T085 corpus harness verified byte-identical

Scope deferred to PR-3

  • T078 (Codec compile-test)
  • T079 (migration-audit URN extension — touches marque-engine audit pipeline; meaningfully larger than the rest of PR-2)
  • T089b (StubScheme readiness)

These three are a coherent "trait-surface-completion" set; keeping PR-2 capco-scoped avoids mixing engine modifications with the vocabulary impl.

Test plan

  • cargo test --workspace — 1169 passed, 0 failed
  • cargo test -p marque-capco --features corpus-harness — 454 passed, 0 failed (T085 byte-identical)
  • cargo test -p marque-capco --features count-allocs --test vocabulary -- --test-threads=1 — 6 passed (T077 zero-alloc green)
  • cargo clippy --workspace --all-targets -- -D warnings clean
  • cargo fmt --all applied

🤖 Generated with Claude Code

Implements `impl Vocabulary<CapcoScheme> for CapcoScheme` (T084) by
composing the per-CVE-file and per-token tables emitted in PR-1
(`marque_ism::generated::vocabulary`) into the
`marque_scheme::Vocabulary<S>` trait surface (`Authority`,
`OwnerProducer`, `PointOfContact`, `Deprecation`, portion/banner
forms, `TokenMetadataFull<TokenId>`).

Per Constitution VII the impl lives in `marque-capco`, not
`marque-ism` (`marque-ism` MUST NOT depend on `marque-scheme`).

## Token coverage

`CapcoScheme::Token = TokenId` is opaque; the active set today is
the ~14 hand-assigned sentinels in `crates/capco/src/scheme.rs`.
This PR maps the 10 sentinels with a corresponding canonical CVE
value (NF, RD, FRD, TFNI, RD-CNWDI, UCNI, HCS, R, ND, XD) into the
generated tables. Aggregate sentinels (`TOK_US_CLASSIFIED`,
`TOK_NON_US_CLASSIFICATION`, `TOK_IC_DISSEM`, `TOK_NON_IC_DISSEM`),
trigraph sentinels (`TOK_USA` — trigraphs are XSD-sourced, not in
the JSON-derived `TOKEN_METADATA`), and grammar-shape sentinels
(`TOK_JOINT`, `TOK_FGI_MARKER`) panic on lookup, surfacing misuse
loudly. Phase C extends both the sentinel set and this mapping.

## Static lifetimes

Every accessor returns `&'static` data via `LazyLock<Vec<...>>`-
backed tables — initialized once on first call, dereferenced in
place thereafter. `count-allocs` test verifies post-warmup
accessors do zero heap allocation (T077).

## FOUO regression guards (FR-020)

- `crates/ism/tests/migrations.rs::fouo_is_not_in_migration_table`
  asserts the deprecated `FOUO → CUI` entry stays out of
  `MIGRATIONS` and no entry's replacement is `CUI` (T075).
- `fouo_remains_in_active_token_metadata` asserts FOUO is still
  published under `CVE_DISSEM` in the v2022-DEC schema package.
- `crates/capco/tests/vocabulary.rs::fouo_remains_active_dissem_control`
  runs an FOUO-bearing input through `Engine::lint` and asserts no
  diagnostic suggests CUI migration (T076).

## Test coverage

- T071 every_active_token_has_authority — iterates active sentinel
  set; asserts `authority`, `owner_producer`, `point_of_contact`,
  `portion_form`, `banner_form` populated for every entry.
- T072 authority_points_to_odni_for_ism_tokens — URN starts with
  `urn:us:gov:ic:cvenum:`, `schema_version == "ISM-v2022-DEC"`.
- T073 deprecated_tokens_carry_deprecation — active sentinels
  return `None` from `deprecation()`.
- T074 deprecation_replacement_when_known — when a deprecation IS
  populated, the named replacement must resolve cleanly in the
  same vocabulary (no dangling pointers).
- T075/T076 — FOUO regression guards (above).
- T077 metadata_query_is_zero_alloc — gated behind the new
  `count-allocs` feature; mirrors `marque-core/tests/alloc_budget.rs`
  shape (gap register #15).
- T085 corpus harness verified byte-identical
  (`cargo test -p marque-capco --features corpus-harness` green).

## Scope deferred to PR-3

T078 (Codec compile-test), T079 (migration-audit URN extension —
touches `marque-engine` audit pipeline), and T089b (StubScheme
readiness) are grouped into the next PR as a coherent
"trait-surface-completion" set. Keeping PR-2 scoped to capco-only
changes avoids mixing engine modifications with the vocabulary
impl.

## Verification

- `cargo test --workspace` — 1169 passed, 0 failed
- `cargo test -p marque-capco --features corpus-harness` — 454
  passed, 0 failed (T085 byte-identical)
- `cargo test -p marque-capco --features count-allocs --test
  vocabulary -- --test-threads=1` — 6 passed (T077 zero-alloc green)
- `cargo clippy --workspace --all-targets -- -D warnings` clean
- `cargo fmt --all` applied

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@bashandbone bashandbone requested review from a team and Copilot April 25, 2026 13:57
@github-actions
Copy link
Copy Markdown

🤖 Hi @bashandbone, I've received your request, and I'm working on it now! You can track my progress in the logs for more details.

@github-actions
Copy link
Copy Markdown

🤖 I'm sorry @bashandbone, but I was unable to process your request. Please see the logs for more details.

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 25, 2026

Codecov Report

❌ Patch coverage is 90.34091% with 17 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
crates/capco/src/vocabulary.rs 90.34% 14 Missing and 3 partials ⚠️

📢 Thoughts on this report? Let us know!

This comment was marked as resolved.

bashandbone and others added 3 commits April 25, 2026 10:04
1. **Single source of truth for POC** (Copilot vocabulary.rs:162):
   `CveFileDerived` no longer carries a separate `point_of_contact`
   field. The `Authority` struct already embeds a `PointOfContact`,
   so the per-token `point_of_contact()` accessor now returns
   `&authority.point_of_contact`. Drift between
   `scheme.authority(t).point_of_contact` and
   `scheme.point_of_contact(t)` is unrepresentable.

2. **Reuse derived_for_token in build_metadata** (Copilot
   vocabulary.rs:300): `build_metadata` now calls
   `derived_for_token(token)` instead of inlining the per-CveFile
   derivation. The earlier `CveFileDerivedInline` helper was based
   on a misread of the LazyLock init order — `CVE_FILE_DERIVED` is
   independent of `TOKEN_DERIVED`, so calling it from inside
   `build_metadata` is safe. Reusing the cached record makes
   `scheme.metadata(t).authority` and `scheme.authority(t)` literally
   the same bytes.

3. **Rename misleading test** (Copilot tests/vocabulary.rs:146):
   `deprecated_tokens_carry_deprecation` →
   `active_tokens_have_no_deprecation_metadata`. The body asserts
   the negative case (`active sentinels return None`); the new name
   matches the behavior. When Phase C adds deprecated sentinels, a
   sibling test for the positive case lands alongside.

4. **Isolate zero-alloc test to its own binary** (Copilot
   tests/vocabulary.rs:325): Split T077 into
   `crates/capco/tests/vocabulary_zero_alloc.rs`, gated at the FILE
   level on `#![cfg(feature = "count-allocs")]`. Mirrors the
   discipline in `crates/core/tests/alloc_budget.rs` (gap register
   #15). The counting global allocator now has no other tests in
   the same binary to inflate its measurements; `--test-threads=1`
   becomes a recommendation, not a hard requirement.

5. **eq_ignore_ascii_case in CUI guard** (Copilot
   tests/migrations.rs:46): Replaced
   `entry.replacement.to_ascii_uppercase() != "CUI"` with
   `!entry.replacement.eq_ignore_ascii_case("CUI")`. The previous
   form allocated a fresh `String` per iteration; the new form is
   allocation-free.

Verification:
- `cargo test -p marque-capco --test vocabulary` — 5 passed
- `cargo test -p marque-ism --test migrations` — 2 passed
- `cargo test -p marque-capco --features count-allocs --test
  vocabulary_zero_alloc` — 1 passed
- `cargo test --workspace` — 1169 passed, 0 failed
- `cargo clippy --workspace --all-targets -- -D warnings` clean
- `cargo clippy -p marque-capco --features count-allocs --
  -D warnings` clean

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The `typos` CI gate flags two-letter `ba` as a likely typo of `be`/`by`.
Same identifier-shape concern as past renames; no behavior change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread crates/capco/Cargo.toml Outdated
Comment on lines +70 to +75
# `Vocabulary` accessors in `crates/capco/tests/vocabulary.rs`. Mirrors
# the `count-allocs` feature in `marque-core` (gap register #15) — it
# installs a process-wide counting global allocator, so we keep it
# opt-in to avoid polluting the default `cargo test` run. CI invokes
# this with `cargo test -p marque-capco --features count-allocs --test
# vocabulary -- --test-threads=1`.
Copy link

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The count-allocs feature comment points at crates/capco/tests/vocabulary.rs and suggests running --test vocabulary, but the zero-allocation regression gate is implemented in crates/capco/tests/vocabulary_zero_alloc.rs (and its own doc comment recommends --test vocabulary_zero_alloc). Please update the feature docs to reference the correct test file/command so CI/runbook guidance doesn’t drift.

Suggested change
# `Vocabulary` accessors in `crates/capco/tests/vocabulary.rs`. Mirrors
# the `count-allocs` feature in `marque-core` (gap register #15) — it
# installs a process-wide counting global allocator, so we keep it
# opt-in to avoid polluting the default `cargo test` run. CI invokes
# this with `cargo test -p marque-capco --features count-allocs --test
# vocabulary -- --test-threads=1`.
# `Vocabulary` accessors in `crates/capco/tests/vocabulary_zero_alloc.rs`.
# Mirrors the `count-allocs` feature in `marque-core` (gap register #15)
# — it installs a process-wide counting global allocator, so we keep it
# opt-in to avoid polluting the default `cargo test` run. CI invokes
# this with `cargo test -p marque-capco --features count-allocs --test
# vocabulary_zero_alloc -- --test-threads=1`.

Copilot uses AI. Check for mistakes.
Comment thread crates/capco/src/vocabulary.rs Outdated
//! referenced by index. The composed `TokenMetadataFull<TokenId>`
//! records are similarly built once. Calls after the first do zero
//! heap allocation — exercised by the count-allocs test in
//! `tests/vocabulary.rs` (gated on the `count-allocs` feature).
Copy link

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Module docs mention the count-allocs regression test living in tests/vocabulary.rs, but the PR adds it as crates/capco/tests/vocabulary_zero_alloc.rs. Please update this reference so readers can find the gate and run it with the documented command.

Suggested change
//! `tests/vocabulary.rs` (gated on the `count-allocs` feature).
//! `crates/capco/tests/vocabulary_zero_alloc.rs` (gated on the
//! `count-allocs` feature).

Copilot uses AI. Check for mistakes.
Comment thread crates/capco/src/lib.rs Outdated
pub(crate) mod rules_sci_per_system;
pub mod scheme;
pub mod vocab;
pub mod vocabulary;
Copy link

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vocabulary is introduced as a public module, but the file appears to exist solely to provide the impl Vocabulary<CapcoScheme> for CapcoScheme and internal tables/helpers. If external callers don’t need to name this module, consider making it private (mod vocabulary;) or pub(crate) to avoid expanding the public API surface area unnecessarily.

Suggested change
pub mod vocabulary;
mod vocabulary;

Copilot uses AI. Check for mistakes.
## Copilot review round 2

1. **Cargo.toml feature comment drift**
   (Copilot Cargo.toml:75): The `count-allocs` feature comment
   pointed at `crates/capco/tests/vocabulary.rs` and recommended
   `--test vocabulary`, but the gate was moved to its own file in
   the round-1 fix. Updated the comment to reference
   `crates/capco/tests/vocabulary_zero_alloc.rs` and the correct
   `--test vocabulary_zero_alloc` invocation.

2. **Module-doc reference drift**
   (Copilot vocabulary.rs:48): Same drift as above in the module
   doc-comment under "Static data lifetimes". Updated to point at
   `crates/capco/tests/vocabulary_zero_alloc.rs`.

3. **`vocabulary` should be private**
   (Copilot lib.rs:28): The module exists solely to host
   `impl Vocabulary<CapcoScheme>` and internal `LazyLock`-backed
   tables. Trait-method resolution finds the impl through
   `marque_scheme::Vocabulary` regardless of the module's
   visibility. Demoted to `mod vocabulary;` (private). External
   public surface unchanged.

## Coverage expansion (T077a)

Codecov flagged 79.32% patch coverage on `vocabulary.rs` (~37
uncovered lines). T071–T077 cover the happy-path active-sentinel
loop but never reach:

- The four `panic!` chokepoints in `canonical_for` / `entry_for` /
  `derived_for_token` / `token_derived` (every active sentinel
  resolves cleanly).
- The `Some` arm of `derive_banner_abbreviation` (no current test
  asserts a specific banner abbreviation; the happy-path loop
  only checks non-emptiness).
- The cross-accessor consistency invariant
  (`metadata(t).{field}` = per-field accessor) the round-1
  single-source-of-truth refactor relies on.

Added 11 new tests in `crates/capco/tests/vocabulary.rs`:

- `banner_abbreviation_some_for_distinct_form` — `NF→NOFORN`,
  `UCNI→DOE UCNI`, `ND→NODIS`, `XD→EXDIS` (CAPCO-2016 §G.1
  Table 4 distinct-abbreviation rows).
- `banner_abbreviation_none_for_same_form` — `RD`, `FRD`, `TFNI`,
  `RD-CNWDI`, `HCS`, `R` (no distinct abbreviation; portion ==
  banner).
- `metadata_agrees_with_per_field_accessors` — exhaustive
  cross-check, including `metadata.authority.point_of_contact ==
  scheme.point_of_contact` (the round-1 SSOT invariant).
- 7 `#[should_panic(expected = "no canonical CVE")]` tests, one
  per accessor (`authority`, `owner_producer`, `point_of_contact`,
  `metadata`, `portion_form`, `banner_form`, `deprecation`),
  using aggregate / trigraph / grammar-shape sentinels
  (`TOK_FGI_MARKER`, `TOK_USA`, `TOK_JOINT`, `TOK_US_CLASSIFIED`,
  `TOK_NON_US_CLASSIFICATION`, `TOK_IC_DISSEM`,
  `TOK_NON_IC_DISSEM`) that are deliberately absent from
  `SENTINEL_TO_CANONICAL`.

Distinct tests per accessor — a refactor that diverts one
accessor away from the shared chokepoint would otherwise pass
coverage by reaching only the first panic.

## Cleanup

- Removed the dead `_banner_to_portion_anchor` helper and its
  unused `banner_to_portion` import. The "future-proofing" comment
  was speculative; YAGNI.

## tasks.md

Added T077a entry between T077 and T078, citing the codecov gap
on PR #143 and listing the four sub-areas covered (a/b/c/d).

## Verification

- `cargo test -p marque-capco --test vocabulary` — 15 passed
  (was 5)
- `cargo test --workspace` — 1180 passed, 0 failed
- `cargo clippy --workspace --all-targets -- -D warnings` clean

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@bashandbone bashandbone merged commit 4320918 into main Apr 25, 2026
7 checks passed
@bashandbone bashandbone deleted the feat/phase5-pr2-vocabulary-impl branch April 25, 2026 15:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants