Background
Operators frequently encounter a token they don't recognize — EXDIS, NODIS, ORCON, a SCI control system they don't normally work with — and have to either know the rule from memory or open CAPCO-2016 and search. Editor integrations should be able to surface the authoritative help text inline (hover, tooltip, palette docs).
Phase 5's Vocabulary<S> already exposes the structural metadata (authority, owner/producer, URN, deprecation status). What's missing is the prose: the explanation a human reader actually needs.
Proposed shape
Extend the per-token metadata generated at build time to include extracted help text from crates/capco/docs/CAPCO-2016.md (the authoritative source per Constitution Principle VIII). Each token (and each category, e.g., the SCI control system class as a whole) gets:
- Short description (one line, ~80 chars) — what it is.
- Long description (paragraph) — when to use it, constraints, common errors.
- Citation (
§X.Y pNN) — the exact CAPCO-2016 location it was extracted from.
- Examples (where CAPCO-2016 provides them) — verbatim portion / banner forms.
Build-time extraction:
crates/capco/build.rs (or a new tools/capco-doc-extract/) parses CAPCO-2016 sections by anchored heading and pulls the prose adjacent to each token mention.
- Manual override table (
crates/capco/data/help_overrides.toml) for cases where the prose extraction can't cleanly bound a description (CAPCO-2016 isn't always structured as one-token-per-section).
- Generated
OUT_DIR/help_text.rs with a &'static [(TokenId, HelpEntry)] table.
Vocabulary<S> gains a help(token: TokenId) -> Option<&HelpEntry> method.
API surface
Build on the previous issue's vocabulary API:
GET /v1/vocab/trigraph/:code → name + help text
GET /v1/vocab/tetragraph/:code → members + help text
GET /v1/vocab/sci/:control → help text + authority
GET /v1/vocab/dissem/:control → help text + replacement (if deprecated)
WASM consumers expose the same data locally for hover-on-token.
Constitution Principle VIII compliance
This is the one feature where citation discipline matters most — every help text entry is a quotation (paraphrase) of CAPCO-2016. The extraction pipeline:
- Build-time verification: every help entry MUST carry a citation that resolves to a real anchor in
crates/capco/docs/CAPCO-2016.md. Build fails if a citation doesn't resolve.
- No paraphrase drift: the long-description prose should be near-verbatim from CAPCO-2016 (within fair-use bounds for the single-line short description). A future LLM-generated rewrite isn't allowed without explicit human verification per Principle VIII.
- Source revisions: when CAPCO-2016 is succeeded (a future revision), the help-text table is regenerated as part of the planned migration, not silently refreshed.
- Override audit: the
help_overrides.toml file is human-curated; every override must include a justification in a comment naming the section of CAPCO-2016 it derives from.
Acceptance
- Build-time extraction of help text from
crates/capco/docs/CAPCO-2016.md.
- Per-token short + long description + citation + examples.
- Manual override table for cases extraction can't handle cleanly.
- Build fails on unresolvable citations.
Vocabulary<S>::help API method.
- WASM-safe (no runtime I/O).
- API endpoints in
marque-server (composes with #issue for vocabulary lookup API).
- Coverage target: ≥90% of tokens in
marque-capco::vocab have a non-empty help entry. Tokens without one fall through to the structural metadata only.
Open questions
- Long descriptions can be multi-paragraph; how do we keep the binary size reasonable? Consider lazy-loading via include_str! of pre-extracted snippets, vs constants. Probably constants —
&'static str slices into a single embedded blob.
- Should help entries cite line numbers or section anchors? Memory says "citations use page numbers, not line numbers" (commit b340bec). Use
§X.Y pNN.
- Out-of-scope: localization. CAPCO-2016 is English; help text is English. Future non-US schemes (NATO, etc.) will have their own authoritative sources in their original languages.
Related
Background
Operators frequently encounter a token they don't recognize —
EXDIS,NODIS,ORCON, a SCI control system they don't normally work with — and have to either know the rule from memory or open CAPCO-2016 and search. Editor integrations should be able to surface the authoritative help text inline (hover, tooltip, palette docs).Phase 5's
Vocabulary<S>already exposes the structural metadata (authority, owner/producer, URN, deprecation status). What's missing is the prose: the explanation a human reader actually needs.Proposed shape
Extend the per-token metadata generated at build time to include extracted help text from
crates/capco/docs/CAPCO-2016.md(the authoritative source per Constitution Principle VIII). Each token (and each category, e.g., the SCI control system class as a whole) gets:§X.Y pNN) — the exact CAPCO-2016 location it was extracted from.Build-time extraction:
crates/capco/build.rs(or a newtools/capco-doc-extract/) parses CAPCO-2016 sections by anchored heading and pulls the prose adjacent to each token mention.crates/capco/data/help_overrides.toml) for cases where the prose extraction can't cleanly bound a description (CAPCO-2016 isn't always structured as one-token-per-section).OUT_DIR/help_text.rswith a&'static [(TokenId, HelpEntry)]table.Vocabulary<S>gains ahelp(token: TokenId) -> Option<&HelpEntry>method.API surface
Build on the previous issue's vocabulary API:
WASM consumers expose the same data locally for hover-on-token.
Constitution Principle VIII compliance
This is the one feature where citation discipline matters most — every help text entry is a quotation (paraphrase) of CAPCO-2016. The extraction pipeline:
crates/capco/docs/CAPCO-2016.md. Build fails if a citation doesn't resolve.help_overrides.tomlfile is human-curated; every override must include a justification in a comment naming the section of CAPCO-2016 it derives from.Acceptance
crates/capco/docs/CAPCO-2016.md.Vocabulary<S>::helpAPI method.marque-server(composes with #issue for vocabulary lookup API).marque-capco::vocabhave a non-empty help entry. Tokens without one fall through to the structural metadata only.Open questions
&'static strslices into a single embedded blob.§X.Y pNN.Related
Vocabulary<S>(Phase 5 PR-1: vocabulary metadata generation (T080–T082) #141, Phase 5 PR-3: trait-surface completion (T078 + T079 + T089b) #146 — extends the structural metadata surface with prose)crates/capco/docs/CAPCO-2016.md(canonical source)