Phase 5 PR-2: Vocabulary<CapcoScheme> impl + FOUO regression guards by bashandbone · Pull Request #143 · marquetools/marque

bashandbone · 2026-04-25T13:57:07Z

Summary

Implements impl Vocabulary<CapcoScheme> for CapcoScheme (T084) — composes the per-CVE-file and per-token tables emitted in PR-1 (marque_ism::generated::vocabulary) into the marque_scheme::Vocabulary<S> trait surface (Authority, OwnerProducer, PointOfContact, Deprecation, portion/banner forms, TokenMetadataFull<TokenId>).
Adds FOUO regression guards (T075/T076) to keep the deprecated FOUO → CUI migration entry from being silently re-introduced and to verify FOUO remains an active dissem control end-to-end.
Adds zero-allocation regression test (T077) gated behind a new count-allocs feature on marque-capco, mirroring the existing harness in marque-core/tests/alloc_budget.rs.

Per Constitution VII, the trait impl lives in marque-capco (the convergence crate that depends on both marque-ism and marque-scheme), not in marque-ism.

Token coverage

CapcoScheme::Token = TokenId is opaque; the active set today is the ~14 hand-assigned sentinels in crates/capco/src/scheme.rs. This PR maps the 10 sentinels with a corresponding canonical CVE value (NF, RD, FRD, TFNI, RD-CNWDI, UCNI, HCS, R, ND, XD) into the generated tables.

Aggregate sentinels (TOK_US_CLASSIFIED, TOK_NON_US_CLASSIFICATION, TOK_IC_DISSEM, TOK_NON_IC_DISSEM), trigraph sentinels (TOK_USA — trigraphs are XSD-sourced, not in the JSON-derived TOKEN_METADATA), and grammar-shape sentinels (TOK_JOINT, TOK_FGI_MARKER) panic on lookup, surfacing misuse loudly. Phase C extends both the sentinel set and this mapping.

Static lifetimes

Every accessor returns &'static data via LazyLock<Vec<...>>-backed tables — initialized once on first call, dereferenced in place thereafter. The count-allocs test verifies post-warmup accessors do zero heap allocation.

FOUO regression guards (FR-020)

Test	Location	Asserts
`fouo_is_not_in_migration_table`	`crates/ism/tests/migrations.rs`	Deprecated `FOUO → CUI` entry stays out of `MIGRATIONS`; no entry's replacement is `CUI`.
`fouo_remains_in_active_token_metadata`	`crates/ism/tests/migrations.rs`	FOUO is still published under `CVE_DISSEM` in v2022-DEC.
`fouo_remains_active_dissem_control`	`crates/capco/tests/vocabulary.rs`	FOUO-bearing input through `Engine::lint` produces no diagnostic suggesting CUI migration.

Tasks landed

T071 every_active_token_has_authority
T072 authority_points_to_odni_for_ism_tokens
T073 deprecated_tokens_carry_deprecation
T074 deprecation_replacement_when_known
T075 fouo_is_not_in_migration_table (+ fouo_remains_in_active_token_metadata)
T076 fouo_remains_active_dissem_control
T077 metadata_query_is_zero_alloc (new count-allocs feature)
T084 impl Vocabulary<CapcoScheme> for CapcoScheme
T085 corpus harness verified byte-identical

Scope deferred to PR-3

T078 (Codec compile-test)
T079 (migration-audit URN extension — touches marque-engine audit pipeline; meaningfully larger than the rest of PR-2)
T089b (StubScheme readiness)

These three are a coherent "trait-surface-completion" set; keeping PR-2 capco-scoped avoids mixing engine modifications with the vocabulary impl.

Test plan

cargo test --workspace — 1169 passed, 0 failed
cargo test -p marque-capco --features corpus-harness — 454 passed, 0 failed (T085 byte-identical)
cargo test -p marque-capco --features count-allocs --test vocabulary -- --test-threads=1 — 6 passed (T077 zero-alloc green)
cargo clippy --workspace --all-targets -- -D warnings clean
cargo fmt --all applied

🤖 Generated with Claude Code

Implements `impl Vocabulary<CapcoScheme> for CapcoScheme` (T084) by composing the per-CVE-file and per-token tables emitted in PR-1 (`marque_ism::generated::vocabulary`) into the `marque_scheme::Vocabulary<S>` trait surface (`Authority`, `OwnerProducer`, `PointOfContact`, `Deprecation`, portion/banner forms, `TokenMetadataFull<TokenId>`). Per Constitution VII the impl lives in `marque-capco`, not `marque-ism` (`marque-ism` MUST NOT depend on `marque-scheme`). ## Token coverage `CapcoScheme::Token = TokenId` is opaque; the active set today is the ~14 hand-assigned sentinels in `crates/capco/src/scheme.rs`. This PR maps the 10 sentinels with a corresponding canonical CVE value (NF, RD, FRD, TFNI, RD-CNWDI, UCNI, HCS, R, ND, XD) into the generated tables. Aggregate sentinels (`TOK_US_CLASSIFIED`, `TOK_NON_US_CLASSIFICATION`, `TOK_IC_DISSEM`, `TOK_NON_IC_DISSEM`), trigraph sentinels (`TOK_USA` — trigraphs are XSD-sourced, not in the JSON-derived `TOKEN_METADATA`), and grammar-shape sentinels (`TOK_JOINT`, `TOK_FGI_MARKER`) panic on lookup, surfacing misuse loudly. Phase C extends both the sentinel set and this mapping. ## Static lifetimes Every accessor returns `&'static` data via `LazyLock<Vec<...>>`- backed tables — initialized once on first call, dereferenced in place thereafter. `count-allocs` test verifies post-warmup accessors do zero heap allocation (T077). ## FOUO regression guards (FR-020) - `crates/ism/tests/migrations.rs::fouo_is_not_in_migration_table` asserts the deprecated `FOUO → CUI` entry stays out of `MIGRATIONS` and no entry's replacement is `CUI` (T075). - `fouo_remains_in_active_token_metadata` asserts FOUO is still published under `CVE_DISSEM` in the v2022-DEC schema package. - `crates/capco/tests/vocabulary.rs::fouo_remains_active_dissem_control` runs an FOUO-bearing input through `Engine::lint` and asserts no diagnostic suggests CUI migration (T076). ## Test coverage - T071 every_active_token_has_authority — iterates active sentinel set; asserts `authority`, `owner_producer`, `point_of_contact`, `portion_form`, `banner_form` populated for every entry. - T072 authority_points_to_odni_for_ism_tokens — URN starts with `urn:us:gov:ic:cvenum:`, `schema_version == "ISM-v2022-DEC"`. - T073 deprecated_tokens_carry_deprecation — active sentinels return `None` from `deprecation()`. - T074 deprecation_replacement_when_known — when a deprecation IS populated, the named replacement must resolve cleanly in the same vocabulary (no dangling pointers). - T075/T076 — FOUO regression guards (above). - T077 metadata_query_is_zero_alloc — gated behind the new `count-allocs` feature; mirrors `marque-core/tests/alloc_budget.rs` shape (gap register #15). - T085 corpus harness verified byte-identical (`cargo test -p marque-capco --features corpus-harness` green). ## Scope deferred to PR-3 T078 (Codec compile-test), T079 (migration-audit URN extension — touches `marque-engine` audit pipeline), and T089b (StubScheme readiness) are grouped into the next PR as a coherent "trait-surface-completion" set. Keeping PR-2 scoped to capco-only changes avoids mixing engine modifications with the vocabulary impl. ## Verification - `cargo test --workspace` — 1169 passed, 0 failed - `cargo test -p marque-capco --features corpus-harness` — 454 passed, 0 failed (T085 byte-identical) - `cargo test -p marque-capco --features count-allocs --test vocabulary -- --test-threads=1` — 6 passed (T077 zero-alloc green) - `cargo clippy --workspace --all-targets -- -D warnings` clean - `cargo fmt --all` applied Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-04-25T13:57:16Z

🤖 Hi @bashandbone, I've received your request, and I'm working on it now! You can track my progress in the logs for more details.

github-actions · 2026-04-25T13:57:28Z

🤖 I'm sorry @bashandbone, but I was unable to process your request. Please see the logs for more details.

codecov · 2026-04-25T13:59:11Z

Codecov Report

❌ Patch coverage is 90.34091% with 17 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
crates/capco/src/vocabulary.rs	90.34%	14 Missing and 3 partials ⚠️

📢 Thoughts on this report? Let us know!

1. **Single source of truth for POC** (Copilot vocabulary.rs:162): `CveFileDerived` no longer carries a separate `point_of_contact` field. The `Authority` struct already embeds a `PointOfContact`, so the per-token `point_of_contact()` accessor now returns `&authority.point_of_contact`. Drift between `scheme.authority(t).point_of_contact` and `scheme.point_of_contact(t)` is unrepresentable. 2. **Reuse derived_for_token in build_metadata** (Copilot vocabulary.rs:300): `build_metadata` now calls `derived_for_token(token)` instead of inlining the per-CveFile derivation. The earlier `CveFileDerivedInline` helper was based on a misread of the LazyLock init order — `CVE_FILE_DERIVED` is independent of `TOKEN_DERIVED`, so calling it from inside `build_metadata` is safe. Reusing the cached record makes `scheme.metadata(t).authority` and `scheme.authority(t)` literally the same bytes. 3. **Rename misleading test** (Copilot tests/vocabulary.rs:146): `deprecated_tokens_carry_deprecation` → `active_tokens_have_no_deprecation_metadata`. The body asserts the negative case (`active sentinels return None`); the new name matches the behavior. When Phase C adds deprecated sentinels, a sibling test for the positive case lands alongside. 4. **Isolate zero-alloc test to its own binary** (Copilot tests/vocabulary.rs:325): Split T077 into `crates/capco/tests/vocabulary_zero_alloc.rs`, gated at the FILE level on `#![cfg(feature = "count-allocs")]`. Mirrors the discipline in `crates/core/tests/alloc_budget.rs` (gap register #15). The counting global allocator now has no other tests in the same binary to inflate its measurements; `--test-threads=1` becomes a recommendation, not a hard requirement. 5. **eq_ignore_ascii_case in CUI guard** (Copilot tests/migrations.rs:46): Replaced `entry.replacement.to_ascii_uppercase() != "CUI"` with `!entry.replacement.eq_ignore_ascii_case("CUI")`. The previous form allocated a fresh `String` per iteration; the new form is allocation-free. Verification: - `cargo test -p marque-capco --test vocabulary` — 5 passed - `cargo test -p marque-ism --test migrations` — 2 passed - `cargo test -p marque-capco --features count-allocs --test vocabulary_zero_alloc` — 1 passed - `cargo test --workspace` — 1169 passed, 0 failed - `cargo clippy --workspace --all-targets -- -D warnings` clean - `cargo clippy -p marque-capco --features count-allocs -- -D warnings` clean Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…marquetools/marque into feat/phase5-pr2-vocabulary-impl

The `typos` CI gate flags two-letter `ba` as a likely typo of `be`/`by`. Same identifier-shape concern as past renames; no behavior change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-25T14:14:49Z

+# `Vocabulary` accessors in `crates/capco/tests/vocabulary.rs`. Mirrors
+# the `count-allocs` feature in `marque-core` (gap register #15) — it
+# installs a process-wide counting global allocator, so we keep it
+# opt-in to avoid polluting the default `cargo test` run. CI invokes
+# this with `cargo test -p marque-capco --features count-allocs --test
+# vocabulary -- --test-threads=1`.


The count-allocs feature comment points at crates/capco/tests/vocabulary.rs and suggests running --test vocabulary, but the zero-allocation regression gate is implemented in crates/capco/tests/vocabulary_zero_alloc.rs (and its own doc comment recommends --test vocabulary_zero_alloc). Please update the feature docs to reference the correct test file/command so CI/runbook guidance doesn’t drift.

Suggested change

# `Vocabulary` accessors in `crates/capco/tests/vocabulary.rs`. Mirrors

# the `count-allocs` feature in `marque-core` (gap register #15) — it

# installs a process-wide counting global allocator, so we keep it

# opt-in to avoid polluting the default `cargo test` run. CI invokes

# this with `cargo test -p marque-capco --features count-allocs --test

# vocabulary -- --test-threads=1`.

# `Vocabulary` accessors in `crates/capco/tests/vocabulary_zero_alloc.rs`.

# Mirrors the `count-allocs` feature in `marque-core` (gap register #15)

# — it installs a process-wide counting global allocator, so we keep it

# opt-in to avoid polluting the default `cargo test` run. CI invokes

# this with `cargo test -p marque-capco --features count-allocs --test

# vocabulary_zero_alloc -- --test-threads=1`.

Copilot · 2026-04-25T14:14:50Z

+//! referenced by index. The composed `TokenMetadataFull<TokenId>`
+//! records are similarly built once. Calls after the first do zero
+//! heap allocation — exercised by the count-allocs test in
+//! `tests/vocabulary.rs` (gated on the `count-allocs` feature).


Module docs mention the count-allocs regression test living in tests/vocabulary.rs, but the PR adds it as crates/capco/tests/vocabulary_zero_alloc.rs. Please update this reference so readers can find the gate and run it with the documented command.

Suggested change

//! `tests/vocabulary.rs` (gated on the `count-allocs` feature).

//! `crates/capco/tests/vocabulary_zero_alloc.rs` (gated on the

//! `count-allocs` feature).

Copilot · 2026-04-25T14:14:50Z

 pub(crate) mod rules_sci_per_system;
 pub mod scheme;
 pub mod vocab;
+pub mod vocabulary;


vocabulary is introduced as a public module, but the file appears to exist solely to provide the impl Vocabulary<CapcoScheme> for CapcoScheme and internal tables/helpers. If external callers don’t need to name this module, consider making it private (mod vocabulary;) or pub(crate) to avoid expanding the public API surface area unnecessarily.

Suggested change

pub mod vocabulary;

mod vocabulary;

## Copilot review round 2 1. **Cargo.toml feature comment drift** (Copilot Cargo.toml:75): The `count-allocs` feature comment pointed at `crates/capco/tests/vocabulary.rs` and recommended `--test vocabulary`, but the gate was moved to its own file in the round-1 fix. Updated the comment to reference `crates/capco/tests/vocabulary_zero_alloc.rs` and the correct `--test vocabulary_zero_alloc` invocation. 2. **Module-doc reference drift** (Copilot vocabulary.rs:48): Same drift as above in the module doc-comment under "Static data lifetimes". Updated to point at `crates/capco/tests/vocabulary_zero_alloc.rs`. 3. **`vocabulary` should be private** (Copilot lib.rs:28): The module exists solely to host `impl Vocabulary<CapcoScheme>` and internal `LazyLock`-backed tables. Trait-method resolution finds the impl through `marque_scheme::Vocabulary` regardless of the module's visibility. Demoted to `mod vocabulary;` (private). External public surface unchanged. ## Coverage expansion (T077a) Codecov flagged 79.32% patch coverage on `vocabulary.rs` (~37 uncovered lines). T071–T077 cover the happy-path active-sentinel loop but never reach: - The four `panic!` chokepoints in `canonical_for` / `entry_for` / `derived_for_token` / `token_derived` (every active sentinel resolves cleanly). - The `Some` arm of `derive_banner_abbreviation` (no current test asserts a specific banner abbreviation; the happy-path loop only checks non-emptiness). - The cross-accessor consistency invariant (`metadata(t).{field}` = per-field accessor) the round-1 single-source-of-truth refactor relies on. Added 11 new tests in `crates/capco/tests/vocabulary.rs`: - `banner_abbreviation_some_for_distinct_form` — `NF→NOFORN`, `UCNI→DOE UCNI`, `ND→NODIS`, `XD→EXDIS` (CAPCO-2016 §G.1 Table 4 distinct-abbreviation rows). - `banner_abbreviation_none_for_same_form` — `RD`, `FRD`, `TFNI`, `RD-CNWDI`, `HCS`, `R` (no distinct abbreviation; portion == banner). - `metadata_agrees_with_per_field_accessors` — exhaustive cross-check, including `metadata.authority.point_of_contact == scheme.point_of_contact` (the round-1 SSOT invariant). - 7 `#[should_panic(expected = "no canonical CVE")]` tests, one per accessor (`authority`, `owner_producer`, `point_of_contact`, `metadata`, `portion_form`, `banner_form`, `deprecation`), using aggregate / trigraph / grammar-shape sentinels (`TOK_FGI_MARKER`, `TOK_USA`, `TOK_JOINT`, `TOK_US_CLASSIFIED`, `TOK_NON_US_CLASSIFICATION`, `TOK_IC_DISSEM`, `TOK_NON_IC_DISSEM`) that are deliberately absent from `SENTINEL_TO_CANONICAL`. Distinct tests per accessor — a refactor that diverts one accessor away from the shared chokepoint would otherwise pass coverage by reaching only the first panic. ## Cleanup - Removed the dead `_banner_to_portion_anchor` helper and its unused `banner_to_portion` import. The "future-proofing" comment was speculative; YAGNI. ## tasks.md Added T077a entry between T077 and T078, citing the codecov gap on PR #143 and listing the four sub-areas covered (a/b/c/d). ## Verification - `cargo test -p marque-capco --test vocabulary` — 15 passed (was 5) - `cargo test --workspace` — 1180 passed, 0 failed - `cargo clippy --workspace --all-targets -- -D warnings` clean Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated no new comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

bashandbone requested review from a team and Copilot April 25, 2026 13:57

Copilot started reviewing on behalf of bashandbone April 25, 2026 13:57 View session

This comment was marked as resolved.

Sign in to view

bashandbone and others added 3 commits April 25, 2026 10:04

Merge branch 'main' into feat/phase5-pr2-vocabulary-impl

0e12275

Merge branch 'feat/phase5-pr2-vocabulary-impl' of https://github.com/…

de97839

…marquetools/marque into feat/phase5-pr2-vocabulary-impl

bashandbone requested a review from Copilot April 25, 2026 14:10

Copilot started reviewing on behalf of bashandbone April 25, 2026 14:11 View session

fix(capco): rename ba to abbr in zero-alloc test for typos linter

cadc4ee

The `typos` CI gate flags two-letter `ba` as a likely typo of `be`/`by`. Same identifier-shape concern as past renames; no behavior change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Copilot AI reviewed Apr 25, 2026

View reviewed changes

bashandbone requested a review from Copilot April 25, 2026 14:55

Copilot started reviewing on behalf of bashandbone April 25, 2026 14:55 View session

Copilot AI reviewed Apr 25, 2026

View reviewed changes

fix: formatting

96fedf9

bashandbone merged commit 4320918 into main Apr 25, 2026
7 checks passed

bashandbone deleted the feat/phase5-pr2-vocabulary-impl branch April 25, 2026 15:07

bashandbone mentioned this pull request Apr 25, 2026

Phase 5 PR-3: trait-surface completion (T078 + T079 + T089b) #146

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Phase 5 PR-2: Vocabulary<CapcoScheme> impl + FOUO regression guards#143

Phase 5 PR-2: Vocabulary<CapcoScheme> impl + FOUO regression guards#143
bashandbone merged 7 commits into
mainfrom
feat/phase5-pr2-vocabulary-impl

bashandbone commented Apr 25, 2026

Uh oh!

github-actions Bot commented Apr 25, 2026

Uh oh!

github-actions Bot commented Apr 25, 2026

Uh oh!

codecov Bot commented Apr 25, 2026 •

edited

Loading

Uh oh!

This comment was marked as resolved.

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 25, 2026

Uh oh!

Copilot AI Apr 25, 2026

Uh oh!

Copilot AI Apr 25, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	//! `tests/vocabulary.rs` (gated on the `count-allocs` feature).
	//! `crates/capco/tests/vocabulary_zero_alloc.rs` (gated on the
	//! `count-allocs` feature).

Uh oh!

Conversation

bashandbone commented Apr 25, 2026

Summary

Token coverage

Static lifetimes

FOUO regression guards (FR-020)

Tasks landed

Scope deferred to PR-3

Test plan

Uh oh!

github-actions Bot commented Apr 25, 2026

Uh oh!

github-actions Bot commented Apr 25, 2026

Uh oh!

codecov Bot commented Apr 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

This comment was marked as resolved.

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov Bot commented Apr 25, 2026 •

edited

Loading