Phase 5 PR-2: Vocabulary<CapcoScheme> impl + FOUO regression guards#143
Conversation
Implements `impl Vocabulary<CapcoScheme> for CapcoScheme` (T084) by composing the per-CVE-file and per-token tables emitted in PR-1 (`marque_ism::generated::vocabulary`) into the `marque_scheme::Vocabulary<S>` trait surface (`Authority`, `OwnerProducer`, `PointOfContact`, `Deprecation`, portion/banner forms, `TokenMetadataFull<TokenId>`). Per Constitution VII the impl lives in `marque-capco`, not `marque-ism` (`marque-ism` MUST NOT depend on `marque-scheme`). ## Token coverage `CapcoScheme::Token = TokenId` is opaque; the active set today is the ~14 hand-assigned sentinels in `crates/capco/src/scheme.rs`. This PR maps the 10 sentinels with a corresponding canonical CVE value (NF, RD, FRD, TFNI, RD-CNWDI, UCNI, HCS, R, ND, XD) into the generated tables. Aggregate sentinels (`TOK_US_CLASSIFIED`, `TOK_NON_US_CLASSIFICATION`, `TOK_IC_DISSEM`, `TOK_NON_IC_DISSEM`), trigraph sentinels (`TOK_USA` — trigraphs are XSD-sourced, not in the JSON-derived `TOKEN_METADATA`), and grammar-shape sentinels (`TOK_JOINT`, `TOK_FGI_MARKER`) panic on lookup, surfacing misuse loudly. Phase C extends both the sentinel set and this mapping. ## Static lifetimes Every accessor returns `&'static` data via `LazyLock<Vec<...>>`- backed tables — initialized once on first call, dereferenced in place thereafter. `count-allocs` test verifies post-warmup accessors do zero heap allocation (T077). ## FOUO regression guards (FR-020) - `crates/ism/tests/migrations.rs::fouo_is_not_in_migration_table` asserts the deprecated `FOUO → CUI` entry stays out of `MIGRATIONS` and no entry's replacement is `CUI` (T075). - `fouo_remains_in_active_token_metadata` asserts FOUO is still published under `CVE_DISSEM` in the v2022-DEC schema package. - `crates/capco/tests/vocabulary.rs::fouo_remains_active_dissem_control` runs an FOUO-bearing input through `Engine::lint` and asserts no diagnostic suggests CUI migration (T076). ## Test coverage - T071 every_active_token_has_authority — iterates active sentinel set; asserts `authority`, `owner_producer`, `point_of_contact`, `portion_form`, `banner_form` populated for every entry. - T072 authority_points_to_odni_for_ism_tokens — URN starts with `urn:us:gov:ic:cvenum:`, `schema_version == "ISM-v2022-DEC"`. - T073 deprecated_tokens_carry_deprecation — active sentinels return `None` from `deprecation()`. - T074 deprecation_replacement_when_known — when a deprecation IS populated, the named replacement must resolve cleanly in the same vocabulary (no dangling pointers). - T075/T076 — FOUO regression guards (above). - T077 metadata_query_is_zero_alloc — gated behind the new `count-allocs` feature; mirrors `marque-core/tests/alloc_budget.rs` shape (gap register #15). - T085 corpus harness verified byte-identical (`cargo test -p marque-capco --features corpus-harness` green). ## Scope deferred to PR-3 T078 (Codec compile-test), T079 (migration-audit URN extension — touches `marque-engine` audit pipeline), and T089b (StubScheme readiness) are grouped into the next PR as a coherent "trait-surface-completion" set. Keeping PR-2 scoped to capco-only changes avoids mixing engine modifications with the vocabulary impl. ## Verification - `cargo test --workspace` — 1169 passed, 0 failed - `cargo test -p marque-capco --features corpus-harness` — 454 passed, 0 failed (T085 byte-identical) - `cargo test -p marque-capco --features count-allocs --test vocabulary -- --test-threads=1` — 6 passed (T077 zero-alloc green) - `cargo clippy --workspace --all-targets -- -D warnings` clean - `cargo fmt --all` applied Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
🤖 Hi @bashandbone, I've received your request, and I'm working on it now! You can track my progress in the logs for more details. |
|
🤖 I'm sorry @bashandbone, but I was unable to process your request. Please see the logs for more details. |
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
1. **Single source of truth for POC** (Copilot vocabulary.rs:162): `CveFileDerived` no longer carries a separate `point_of_contact` field. The `Authority` struct already embeds a `PointOfContact`, so the per-token `point_of_contact()` accessor now returns `&authority.point_of_contact`. Drift between `scheme.authority(t).point_of_contact` and `scheme.point_of_contact(t)` is unrepresentable. 2. **Reuse derived_for_token in build_metadata** (Copilot vocabulary.rs:300): `build_metadata` now calls `derived_for_token(token)` instead of inlining the per-CveFile derivation. The earlier `CveFileDerivedInline` helper was based on a misread of the LazyLock init order — `CVE_FILE_DERIVED` is independent of `TOKEN_DERIVED`, so calling it from inside `build_metadata` is safe. Reusing the cached record makes `scheme.metadata(t).authority` and `scheme.authority(t)` literally the same bytes. 3. **Rename misleading test** (Copilot tests/vocabulary.rs:146): `deprecated_tokens_carry_deprecation` → `active_tokens_have_no_deprecation_metadata`. The body asserts the negative case (`active sentinels return None`); the new name matches the behavior. When Phase C adds deprecated sentinels, a sibling test for the positive case lands alongside. 4. **Isolate zero-alloc test to its own binary** (Copilot tests/vocabulary.rs:325): Split T077 into `crates/capco/tests/vocabulary_zero_alloc.rs`, gated at the FILE level on `#![cfg(feature = "count-allocs")]`. Mirrors the discipline in `crates/core/tests/alloc_budget.rs` (gap register #15). The counting global allocator now has no other tests in the same binary to inflate its measurements; `--test-threads=1` becomes a recommendation, not a hard requirement. 5. **eq_ignore_ascii_case in CUI guard** (Copilot tests/migrations.rs:46): Replaced `entry.replacement.to_ascii_uppercase() != "CUI"` with `!entry.replacement.eq_ignore_ascii_case("CUI")`. The previous form allocated a fresh `String` per iteration; the new form is allocation-free. Verification: - `cargo test -p marque-capco --test vocabulary` — 5 passed - `cargo test -p marque-ism --test migrations` — 2 passed - `cargo test -p marque-capco --features count-allocs --test vocabulary_zero_alloc` — 1 passed - `cargo test --workspace` — 1169 passed, 0 failed - `cargo clippy --workspace --all-targets -- -D warnings` clean - `cargo clippy -p marque-capco --features count-allocs -- -D warnings` clean Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…marquetools/marque into feat/phase5-pr2-vocabulary-impl
The `typos` CI gate flags two-letter `ba` as a likely typo of `be`/`by`. Same identifier-shape concern as past renames; no behavior change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # `Vocabulary` accessors in `crates/capco/tests/vocabulary.rs`. Mirrors | ||
| # the `count-allocs` feature in `marque-core` (gap register #15) — it | ||
| # installs a process-wide counting global allocator, so we keep it | ||
| # opt-in to avoid polluting the default `cargo test` run. CI invokes | ||
| # this with `cargo test -p marque-capco --features count-allocs --test | ||
| # vocabulary -- --test-threads=1`. |
There was a problem hiding this comment.
The count-allocs feature comment points at crates/capco/tests/vocabulary.rs and suggests running --test vocabulary, but the zero-allocation regression gate is implemented in crates/capco/tests/vocabulary_zero_alloc.rs (and its own doc comment recommends --test vocabulary_zero_alloc). Please update the feature docs to reference the correct test file/command so CI/runbook guidance doesn’t drift.
| # `Vocabulary` accessors in `crates/capco/tests/vocabulary.rs`. Mirrors | |
| # the `count-allocs` feature in `marque-core` (gap register #15) — it | |
| # installs a process-wide counting global allocator, so we keep it | |
| # opt-in to avoid polluting the default `cargo test` run. CI invokes | |
| # this with `cargo test -p marque-capco --features count-allocs --test | |
| # vocabulary -- --test-threads=1`. | |
| # `Vocabulary` accessors in `crates/capco/tests/vocabulary_zero_alloc.rs`. | |
| # Mirrors the `count-allocs` feature in `marque-core` (gap register #15) | |
| # — it installs a process-wide counting global allocator, so we keep it | |
| # opt-in to avoid polluting the default `cargo test` run. CI invokes | |
| # this with `cargo test -p marque-capco --features count-allocs --test | |
| # vocabulary_zero_alloc -- --test-threads=1`. |
| //! referenced by index. The composed `TokenMetadataFull<TokenId>` | ||
| //! records are similarly built once. Calls after the first do zero | ||
| //! heap allocation — exercised by the count-allocs test in | ||
| //! `tests/vocabulary.rs` (gated on the `count-allocs` feature). |
There was a problem hiding this comment.
Module docs mention the count-allocs regression test living in tests/vocabulary.rs, but the PR adds it as crates/capco/tests/vocabulary_zero_alloc.rs. Please update this reference so readers can find the gate and run it with the documented command.
| //! `tests/vocabulary.rs` (gated on the `count-allocs` feature). | |
| //! `crates/capco/tests/vocabulary_zero_alloc.rs` (gated on the | |
| //! `count-allocs` feature). |
| pub(crate) mod rules_sci_per_system; | ||
| pub mod scheme; | ||
| pub mod vocab; | ||
| pub mod vocabulary; |
There was a problem hiding this comment.
vocabulary is introduced as a public module, but the file appears to exist solely to provide the impl Vocabulary<CapcoScheme> for CapcoScheme and internal tables/helpers. If external callers don’t need to name this module, consider making it private (mod vocabulary;) or pub(crate) to avoid expanding the public API surface area unnecessarily.
| pub mod vocabulary; | |
| mod vocabulary; |
## Copilot review round 2
1. **Cargo.toml feature comment drift**
(Copilot Cargo.toml:75): The `count-allocs` feature comment
pointed at `crates/capco/tests/vocabulary.rs` and recommended
`--test vocabulary`, but the gate was moved to its own file in
the round-1 fix. Updated the comment to reference
`crates/capco/tests/vocabulary_zero_alloc.rs` and the correct
`--test vocabulary_zero_alloc` invocation.
2. **Module-doc reference drift**
(Copilot vocabulary.rs:48): Same drift as above in the module
doc-comment under "Static data lifetimes". Updated to point at
`crates/capco/tests/vocabulary_zero_alloc.rs`.
3. **`vocabulary` should be private**
(Copilot lib.rs:28): The module exists solely to host
`impl Vocabulary<CapcoScheme>` and internal `LazyLock`-backed
tables. Trait-method resolution finds the impl through
`marque_scheme::Vocabulary` regardless of the module's
visibility. Demoted to `mod vocabulary;` (private). External
public surface unchanged.
## Coverage expansion (T077a)
Codecov flagged 79.32% patch coverage on `vocabulary.rs` (~37
uncovered lines). T071–T077 cover the happy-path active-sentinel
loop but never reach:
- The four `panic!` chokepoints in `canonical_for` / `entry_for` /
`derived_for_token` / `token_derived` (every active sentinel
resolves cleanly).
- The `Some` arm of `derive_banner_abbreviation` (no current test
asserts a specific banner abbreviation; the happy-path loop
only checks non-emptiness).
- The cross-accessor consistency invariant
(`metadata(t).{field}` = per-field accessor) the round-1
single-source-of-truth refactor relies on.
Added 11 new tests in `crates/capco/tests/vocabulary.rs`:
- `banner_abbreviation_some_for_distinct_form` — `NF→NOFORN`,
`UCNI→DOE UCNI`, `ND→NODIS`, `XD→EXDIS` (CAPCO-2016 §G.1
Table 4 distinct-abbreviation rows).
- `banner_abbreviation_none_for_same_form` — `RD`, `FRD`, `TFNI`,
`RD-CNWDI`, `HCS`, `R` (no distinct abbreviation; portion ==
banner).
- `metadata_agrees_with_per_field_accessors` — exhaustive
cross-check, including `metadata.authority.point_of_contact ==
scheme.point_of_contact` (the round-1 SSOT invariant).
- 7 `#[should_panic(expected = "no canonical CVE")]` tests, one
per accessor (`authority`, `owner_producer`, `point_of_contact`,
`metadata`, `portion_form`, `banner_form`, `deprecation`),
using aggregate / trigraph / grammar-shape sentinels
(`TOK_FGI_MARKER`, `TOK_USA`, `TOK_JOINT`, `TOK_US_CLASSIFIED`,
`TOK_NON_US_CLASSIFICATION`, `TOK_IC_DISSEM`,
`TOK_NON_IC_DISSEM`) that are deliberately absent from
`SENTINEL_TO_CANONICAL`.
Distinct tests per accessor — a refactor that diverts one
accessor away from the shared chokepoint would otherwise pass
coverage by reaching only the first panic.
## Cleanup
- Removed the dead `_banner_to_portion_anchor` helper and its
unused `banner_to_portion` import. The "future-proofing" comment
was speculative; YAGNI.
## tasks.md
Added T077a entry between T077 and T078, citing the codecov gap
on PR #143 and listing the four sub-areas covered (a/b/c/d).
## Verification
- `cargo test -p marque-capco --test vocabulary` — 15 passed
(was 5)
- `cargo test --workspace` — 1180 passed, 0 failed
- `cargo clippy --workspace --all-targets -- -D warnings` clean
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Summary
impl Vocabulary<CapcoScheme> for CapcoScheme(T084) — composes the per-CVE-file and per-token tables emitted in PR-1 (marque_ism::generated::vocabulary) into themarque_scheme::Vocabulary<S>trait surface (Authority,OwnerProducer,PointOfContact,Deprecation, portion/banner forms,TokenMetadataFull<TokenId>).FOUO → CUImigration entry from being silently re-introduced and to verify FOUO remains an active dissem control end-to-end.count-allocsfeature onmarque-capco, mirroring the existing harness inmarque-core/tests/alloc_budget.rs.Per Constitution VII, the trait impl lives in
marque-capco(the convergence crate that depends on bothmarque-ismandmarque-scheme), not inmarque-ism.Token coverage
CapcoScheme::Token = TokenIdis opaque; the active set today is the ~14 hand-assigned sentinels incrates/capco/src/scheme.rs. This PR maps the 10 sentinels with a corresponding canonical CVE value (NF,RD,FRD,TFNI,RD-CNWDI,UCNI,HCS,R,ND,XD) into the generated tables.Aggregate sentinels (
TOK_US_CLASSIFIED,TOK_NON_US_CLASSIFICATION,TOK_IC_DISSEM,TOK_NON_IC_DISSEM), trigraph sentinels (TOK_USA— trigraphs are XSD-sourced, not in the JSON-derivedTOKEN_METADATA), and grammar-shape sentinels (TOK_JOINT,TOK_FGI_MARKER) panic on lookup, surfacing misuse loudly. Phase C extends both the sentinel set and this mapping.Static lifetimes
Every accessor returns
&'staticdata viaLazyLock<Vec<...>>-backed tables — initialized once on first call, dereferenced in place thereafter. Thecount-allocstest verifies post-warmup accessors do zero heap allocation.FOUO regression guards (FR-020)
fouo_is_not_in_migration_tablecrates/ism/tests/migrations.rsFOUO → CUIentry stays out ofMIGRATIONS; no entry's replacement isCUI.fouo_remains_in_active_token_metadatacrates/ism/tests/migrations.rsCVE_DISSEMin v2022-DEC.fouo_remains_active_dissem_controlcrates/capco/tests/vocabulary.rsEngine::lintproduces no diagnostic suggesting CUI migration.Tasks landed
every_active_token_has_authorityauthority_points_to_odni_for_ism_tokensdeprecated_tokens_carry_deprecationdeprecation_replacement_when_knownfouo_is_not_in_migration_table(+fouo_remains_in_active_token_metadata)fouo_remains_active_dissem_controlmetadata_query_is_zero_alloc(newcount-allocsfeature)impl Vocabulary<CapcoScheme> for CapcoSchemeScope deferred to PR-3
marque-engineaudit pipeline; meaningfully larger than the rest of PR-2)These three are a coherent "trait-surface-completion" set; keeping PR-2 capco-scoped avoids mixing engine modifications with the vocabulary impl.
Test plan
cargo test --workspace— 1169 passed, 0 failedcargo test -p marque-capco --features corpus-harness— 454 passed, 0 failed (T085 byte-identical)cargo test -p marque-capco --features count-allocs --test vocabulary -- --test-threads=1— 6 passed (T077 zero-alloc green)cargo clippy --workspace --all-targets -- -D warningscleancargo fmt --allapplied🤖 Generated with Claude Code