Summary
The engine rule refactor landed per-rule severity overrides ([rules] E001 = "suggest") as the granular foundation for mode configuration. This closes the question of individual rule tuning. What it does not provide is any of the structural mode concepts that sit above the per-rule layer:
- A bulk severity baseline (audit-only, validate-only) without enumerating every rule
- Zone/axis scope targeting (fix metadata but not body, or portions but not banners/CABs)
- Deployment context (interactive vs. batch/ETL vs. network-boundary audit) that shapes recognizer strategy and audit requirements
- Temporal/grammar-era processing for historical or archival documents
- Archival output sub-modes with incompatible intent shapes (update vs. preserve+metadata vs. validate-for-era)
This issue documents the full mode taxonomy, the current gaps, and proposed extension points.
What the Refactor Covers
The rule-refactor work (PR 006 series) delivered:
| Mechanism |
What it provides |
[rules] E001 = "suggest" |
Per-rule severity override |
Severity::Suggest |
Advisory channel: fires, carries fix candidate, never auto-applies |
Severity::Off |
Disables rule entirely (FR-008: no diagnostic emitted) |
FixMode::Apply / DryRun |
Simulate vs. write — same audit stream |
| Confidence threshold |
Below threshold → auto-downgraded to Suggest |
[closure_rules] severity |
Same per-rule mechanism for declarative closure rows |
These are all point-in-time, per-rule signals. They are the right primitive. The missing layer is composing them into named operational modes that apply across rules, scopes, and contexts without requiring exhaustive per-rule configuration.
M1 — No Bulk Severity Baseline (Audit / Validate Mode)
Gap: There is no [engine] mode = "audit" or equivalent that globally downgrades all Fix-severity rules to Suggest without enumerating them individually. Operators who want an audit-only deployment must:
- Know which rules are
Fix-severity by default
- List every one explicitly in
.marque.toml
- Maintain that list as rules are added or promoted
Concrete operators who need this:
- A SIEM integration that wants diagnostic output only, never mutations
- A CI "check only" job that should fail on violations but never apply fixes
- A historical audit run where fixes would be incorrect (see M4/M5)
Proposed extension to Config:
[engine]
# Bulk severity cap. Any rule whose default severity exceeds this cap
# is silently capped to the cap value. Per-rule overrides in [rules]
# still win (per-rule is higher precedence than the cap).
#
# Values: "suggest" | "info" | "warn" | "error" | "fix" (default)
severity_cap = "suggest" # audit-only deployment
Type-level surface (marque-config):
pub struct EngineConfig {
/// Global severity cap; per-rule overrides in `RuleConfig` still win.
pub severity_cap: Option<Severity>,
}
The engine applies this at construction time when building fast_path_severities: effective = config_override.unwrap_or(rule_default).min(severity_cap). This is a one-line addition to the existing severity resolution path; the per-rule mechanism is already correct, this just adds one more layer above it.
Relation to CLI: The check subcommand already behaves as if fixes are disabled because it calls Engine::lint, not Engine::fix. The gap is at the config level, where a deployed server or batch job has no way to express "run in lint-only mode" without code changes.
M2 — No Zone / Axis Scope Targeting
Gap: There is no way to express "apply fixes to metadata fields only, not document body" or "fix portion markings, not banners/CABs." The Zone enum (Header, Body, Footer, Cab) exists in marque-scheme and RuleContext.zone carries it, but:
- Rules are not annotated with which zones they target
- There is no config surface to restrict fix application to a subset of zones
- Zone is
Option<Zone> on RuleContext and is None in most contexts today (Phase 3 hardcoded Body was removed as "a silent lie"; None is now the honest value)
Concrete operators who need this:
- "Fix portion markings in the text body, but do not touch the document banner (banner is managed by a separate system)"
- "Fix only the classification metadata fields in the structured XML; do not rewrite the embedded text payload"
Proposed rule annotation + config filter:
// In marque-rules or marque-scheme:
/// Zones this rule is eligible to fire on.
/// If `None`, fires on all zones (current behavior for all rules).
fn target_zones(&self) -> Option<&'static [Zone]> { None }
[engine]
# Restrict fix application to these zones only.
# Diagnostics are still emitted for all zones; only fix promotion is gated.
fix_zones = ["body"] # do not auto-apply fixes to header/banner/CAB zones
This is additive. Existing rules returning None from target_zones retain current behavior. The engine's fix-promotion path gates on fix_zones before calling AppliedFix::__engine_promote.
M3 — No Deployment Context
Gap: The engine has no concept of the context in which it is running. This matters because the right defaults differ significantly by context:
| Context |
Preferred recognizer |
Confidence threshold |
Audit required |
Latency budget |
| Interactive (live typing) |
StrictRecognizer |
High |
No |
≤16ms |
| Batch / ETL pipeline |
StrictOrDecoderRecognizer |
Low-medium |
Yes |
High throughput |
| Network boundary inspection |
StrictOrDecoderRecognizer |
Medium |
Required |
Medium |
CLI check (CI) |
Strict-first |
Medium |
Optional |
Medium |
| Archival processing |
Grammar-era-aware |
High |
Required |
Low |
Today, recognizer strategy is hard-wired at construction time (Engine::new installs StrictOrDecoderRecognizer; callers use Engine::with_strict_recognizer() for strict-only). There is no config-driven way to say "this is a network-boundary deployment; use the decoder but require an audit log."
Proposed DeploymentContext in marque-config:
[engine]
deployment = "batch"
# Values: "interactive" | "batch" | "boundary" | "archival"
# Each value implies a set of defaults that explicit config overrides.
deployment = "interactive" implies: strict recognizer, high confidence threshold, no mandatory audit.
deployment = "batch" implies: decoder recognizer, medium threshold, audit log to stderr.
deployment = "boundary" implies: decoder recognizer, high threshold, mandatory audit log.
deployment = "archival" implies: grammar-era-locked recognizer (see M5), mandatory audit, no auto-apply.
This is a defaults profile — every individual option (recognizer, threshold, audit) remains independently overridable. DeploymentContext just gives operators a named bundle.
Relationship to BatchEngine: BatchEngine currently expresses parallelism/throughput, not semantic deployment context. These are orthogonal axes; BatchEngine + deployment = "boundary" should be combinable.
M4 — ParseContext::as_of Is Plumbed but Inert
Gap: ParseContext has:
/// Reference date for temporal membership queries (Phase 3 plumbing).
pub as_of: Option<Arc<str>>,
This field is None at every call site in the engine (engine.rs:1181 hardcodes as_of: None). The field has been acknowledged as a stub for the temporal-membership feature (issue #206), but it has zero effect on recognizer behavior, rule dispatch, or the grammar vocabulary loaded.
Additionally, as_of only reaches the recognizer layer. Even once wired, rules and the parser would have no temporal context — a grammar-era-aware rule (e.g., "NODIS was not a valid dissem control before 2012") has no way to read the document's effective date.
What "temporal context" needs to reach:
| Layer |
Why temporal context matters |
| Recognizer |
Vocabulary terms valid at as_of date; deprecated terms in that era are not errors |
| Parser |
Grammar rules in effect at as_of |
| Rule dispatch |
Rules fire only if they were in effect at as_of; a post-2015 rule should not fire on a 2005 document in archival-validate-for-era mode |
PageRewrite catalog |
Rewrites applicable to the grammar era at as_of |
Proposed minimal wiring path:
- Engine: populate
as_of from LintOptions::as_of or from document metadata (extraction layer)
- Pass
as_of through to rules via RuleContext (currently has no temporal field)
MarkingScheme gains fn vocabulary_at(&self, as_of: Option<&str>) -> &dyn Vocabulary<Self> — returns the vocabulary snapshot for the given date (current behavior when None)
The as_of field design (an Option<Arc<str>> ISO 8601 date) is correct for the stub. The wire-up is the missing work.
M5 — Three Archival Output Modes with Incompatible Intent Shapes
Gap: When processing historical documents, there are three fundamentally different output intents that the type system does not represent:
| Mode |
What the engine should do |
| Update |
Rewrite markings to the current grammar; emit AppliedFix records |
| Preserve + generate metadata |
Do not rewrite the source text; emit what the markings would be in current grammar as metadata only |
| Validate-for-era |
Check the document against the grammar rules in effect when it was written; do not apply current-era corrections |
These three modes have incompatible FixIntent shapes and incompatible audit record semantics. Today there is no way to configure any of them. The FixMode::Apply / DryRun binary is insufficient:
Apply on an archival document could rewrite 2005-era markings using 2024-era rules → incorrect
DryRun simulates but still evaluates current-era rules → same problem
- Neither produces "what the marking means in current terms" as metadata without rewriting
CAPCO-CONTEXT.md acknowledges this: The archival mode for the NOFORN closure-rule pivot is noted as "planned, not yet wired" (CHK041/CHK043 spec contracts exist but nothing is wired).
Proposed ArchivalIntent enum for LintOptions / FixOptions:
/// Output intent for archival (historical) document processing.
/// Only meaningful when `LintOptions::as_of` / `FixOptions::as_of` is set.
#[non_exhaustive]
pub enum ArchivalIntent {
/// Rewrite to current-era markings. Default `Engine::fix` behavior.
Update,
/// Emit diagnostics only; do not rewrite source. Produces a metadata
/// record of current-era equivalents. Only available via `Engine::lint`.
PreserveWithMetadata,
/// Evaluate against the grammar era at `as_of`. Current-era rules
/// that post-date `as_of` are suppressed. No rewrites applied.
ValidateForEra,
}
M6 — Grammar Era Requires More Than a Date String
Gap: ParseContext::as_of is a bare Option<Arc<str>> date string. But a grammar era is not just a date — it may require:
- A different CVE vocabulary (e.g., pre-2015 CAPCO had different SCI control tokens)
- Different
PageRewrite rows (the NOFORN-if-no-FD&R closure rule changed between schema versions)
- Different constraint catalog entries
- A different
Vocabulary<S> snapshot (authority strings, owner metadata, deprecated-as-of dates)
A date string cannot carry this; it must be resolved against a grammar version registry. The ism-schema-version metadata pin in crates/ism/Cargo.toml is the build-time pin for the current version, but there is no runtime concept of "load grammar version V for date D."
Proposed GrammarEra type in marque-scheme:
/// Identifies a specific grammar schema version to use for processing.
/// Grammar authors register known versions via `MarkingScheme::era_at(date)`.
pub struct GrammarEra {
/// Stable schema version label (e.g., "ISM-v2022-DEC").
pub label: Arc<str>,
/// ISO 8601 effective date.
pub effective: Arc<str>,
}
// On MarkingScheme:
fn era_at(&self, as_of: &str) -> Option<GrammarEra>;
This is an additive trait method; CapcoScheme can return None (current behavior) until the historical grammar registry is built. Rules that need to gate on era check RuleContext::grammar_era rather than parsing a raw date string.
Interaction with Related Issues
| Issue |
Interaction |
| #641 (Grammar coupling — T1-3 Engine not generic) |
Temporal engine modes require Engine<S> to actually use the passed S. A historical CapcoScheme::for_era(era) must not be silently discarded by Engine::with_clock. This is a blocking dependency. |
| #643 (InputAdapter protocol) |
DeploymentContext (M3) and InputAdapter are complementary: structured input gives higher confidence signals; deployment context shapes how those signals propagate. A StructuredField input in a boundary deployment should apply stricter audit requirements than the same input in an interactive deployment. |
| #176 (ParseContext input-source signal) |
Overlaps with M3: the input_kind/input_source field proposed in #176 is a per-token confidence modifier. DeploymentContext is a per-engine deployment-wide default. Both are needed; neither subsumes the other. |
#206 (Temporal membership / as_of wiring) |
M4 and M5 directly depend on this issue. as_of must be wired end-to-end (engine → recognizer → rule context → ArchivalIntent gating) before historical grammar support can land. |
Severity / Priority
| Item |
Severity |
Blocking |
| M1 (bulk severity baseline / audit mode) |
High |
Deployments that must never mutate source text have no clean config path today |
| M2 (zone/axis scope targeting) |
Medium |
Addressable by careful per-rule config; ergonomic gap not a correctness gap |
| M3 (deployment context) |
Medium |
BatchEngine covers parallelism; semantic context defaults are a usability gap |
M4 (as_of wiring) |
High |
Archival processing is acknowledged planned work; the wire-up path is clear |
| M5 (archival output modes) |
High |
Without this, Engine::fix on a historical document is unsafe |
| M6 (grammar era type) |
Medium |
Prerequisite for deep historical grammar support; low urgency until M4 is wired |
M1 is the most impactful with the least implementation risk — it is one additional config field and one modification to the severity-resolution path that already exists.
See Also
Summary
The engine rule refactor landed per-rule severity overrides (
[rules] E001 = "suggest") as the granular foundation for mode configuration. This closes the question of individual rule tuning. What it does not provide is any of the structural mode concepts that sit above the per-rule layer:This issue documents the full mode taxonomy, the current gaps, and proposed extension points.
What the Refactor Covers
The rule-refactor work (PR 006 series) delivered:
[rules] E001 = "suggest"Severity::SuggestSeverity::OffFixMode::Apply/DryRunSuggest[closure_rules]severityThese are all point-in-time, per-rule signals. They are the right primitive. The missing layer is composing them into named operational modes that apply across rules, scopes, and contexts without requiring exhaustive per-rule configuration.
M1 — No Bulk Severity Baseline (Audit / Validate Mode)
Gap: There is no
[engine] mode = "audit"or equivalent that globally downgrades allFix-severity rules toSuggestwithout enumerating them individually. Operators who want an audit-only deployment must:Fix-severity by default.marque.tomlConcrete operators who need this:
Proposed extension to
Config:Type-level surface (
marque-config):The engine applies this at construction time when building
fast_path_severities:effective = config_override.unwrap_or(rule_default).min(severity_cap). This is a one-line addition to the existing severity resolution path; the per-rule mechanism is already correct, this just adds one more layer above it.Relation to CLI: The
checksubcommand already behaves as if fixes are disabled because it callsEngine::lint, notEngine::fix. The gap is at the config level, where a deployed server or batch job has no way to express "run in lint-only mode" without code changes.M2 — No Zone / Axis Scope Targeting
Gap: There is no way to express "apply fixes to metadata fields only, not document body" or "fix portion markings, not banners/CABs." The
Zoneenum (Header,Body,Footer,Cab) exists inmarque-schemeandRuleContext.zonecarries it, but:Option<Zone>onRuleContextand isNonein most contexts today (Phase 3 hardcodedBodywas removed as "a silent lie";Noneis now the honest value)Concrete operators who need this:
Proposed rule annotation + config filter:
This is additive. Existing rules returning
Nonefromtarget_zonesretain current behavior. The engine's fix-promotion path gates onfix_zonesbefore callingAppliedFix::__engine_promote.M3 — No Deployment Context
Gap: The engine has no concept of the context in which it is running. This matters because the right defaults differ significantly by context:
StrictRecognizerStrictOrDecoderRecognizerStrictOrDecoderRecognizercheck(CI)Today, recognizer strategy is hard-wired at construction time (
Engine::newinstallsStrictOrDecoderRecognizer; callers useEngine::with_strict_recognizer()for strict-only). There is no config-driven way to say "this is a network-boundary deployment; use the decoder but require an audit log."Proposed
DeploymentContextinmarque-config:deployment = "interactive"implies: strict recognizer, high confidence threshold, no mandatory audit.deployment = "batch"implies: decoder recognizer, medium threshold, audit log to stderr.deployment = "boundary"implies: decoder recognizer, high threshold, mandatory audit log.deployment = "archival"implies: grammar-era-locked recognizer (see M5), mandatory audit, no auto-apply.This is a defaults profile — every individual option (recognizer, threshold, audit) remains independently overridable.
DeploymentContextjust gives operators a named bundle.Relationship to
BatchEngine:BatchEnginecurrently expresses parallelism/throughput, not semantic deployment context. These are orthogonal axes;BatchEngine+deployment = "boundary"should be combinable.M4 —
ParseContext::as_ofIs Plumbed but InertGap:
ParseContexthas:This field is
Noneat every call site in the engine (engine.rs:1181hardcodesas_of: None). The field has been acknowledged as a stub for the temporal-membership feature (issue #206), but it has zero effect on recognizer behavior, rule dispatch, or the grammar vocabulary loaded.Additionally,
as_ofonly reaches the recognizer layer. Even once wired, rules and the parser would have no temporal context — a grammar-era-aware rule (e.g., "NODIS was not a valid dissem control before 2012") has no way to read the document's effective date.What "temporal context" needs to reach:
as_ofdate; deprecated terms in that era are not errorsas_ofas_of; a post-2015 rule should not fire on a 2005 document in archival-validate-for-era modePageRewritecatalogas_ofProposed minimal wiring path:
as_offromLintOptions::as_ofor from document metadata (extraction layer)as_ofthrough to rules viaRuleContext(currently has no temporal field)MarkingSchemegainsfn vocabulary_at(&self, as_of: Option<&str>) -> &dyn Vocabulary<Self>— returns the vocabulary snapshot for the given date (current behavior whenNone)The
as_offield design (anOption<Arc<str>>ISO 8601 date) is correct for the stub. The wire-up is the missing work.M5 — Three Archival Output Modes with Incompatible Intent Shapes
Gap: When processing historical documents, there are three fundamentally different output intents that the type system does not represent:
AppliedFixrecordsThese three modes have incompatible
FixIntentshapes and incompatible audit record semantics. Today there is no way to configure any of them. TheFixMode::Apply/DryRunbinary is insufficient:Applyon an archival document could rewrite 2005-era markings using 2024-era rules → incorrectDryRunsimulates but still evaluates current-era rules → same problemCAPCO-CONTEXT.md acknowledges this: The archival mode for the NOFORN closure-rule pivot is noted as "planned, not yet wired" (CHK041/CHK043 spec contracts exist but nothing is wired).
Proposed
ArchivalIntentenum forLintOptions/FixOptions:M6 — Grammar Era Requires More Than a Date String
Gap:
ParseContext::as_ofis a bareOption<Arc<str>>date string. But a grammar era is not just a date — it may require:PageRewriterows (the NOFORN-if-no-FD&R closure rule changed between schema versions)Vocabulary<S>snapshot (authority strings, owner metadata, deprecated-as-of dates)A date string cannot carry this; it must be resolved against a grammar version registry. The
ism-schema-versionmetadata pin incrates/ism/Cargo.tomlis the build-time pin for the current version, but there is no runtime concept of "load grammar version V for date D."Proposed
GrammarEratype inmarque-scheme:This is an additive trait method;
CapcoSchemecan returnNone(current behavior) until the historical grammar registry is built. Rules that need to gate on era checkRuleContext::grammar_erarather than parsing a raw date string.Interaction with Related Issues
Engine<S>to actually use the passedS. A historicalCapcoScheme::for_era(era)must not be silently discarded byEngine::with_clock. This is a blocking dependency.DeploymentContext(M3) andInputAdapterare complementary: structured input gives higher confidence signals; deployment context shapes how those signals propagate. AStructuredFieldinput in aboundarydeployment should apply stricter audit requirements than the same input in aninteractivedeployment.input_kind/input_sourcefield proposed in #176 is a per-token confidence modifier.DeploymentContextis a per-engine deployment-wide default. Both are needed; neither subsumes the other.as_ofwiring)as_ofmust be wired end-to-end (engine → recognizer → rule context →ArchivalIntentgating) before historical grammar support can land.Severity / Priority
BatchEnginecovers parallelism; semantic context defaults are a usability gapas_ofwiring)Engine::fixon a historical document is unsafeM1 is the most impactful with the least implementation risk — it is one additional config field and one modification to the severity-resolution path that already exists.
See Also
as_of(direct prerequisite for M4/M5)crates/capco/CAPCO-CONTEXT.md— archival mode acknowledgment (CHK041/CHK043 spec contracts)crates/scheme/src/severity.rs—Severitytotal ordering (themin(severity_cap)proposal in M1 uses this)crates/scheme/src/scope.rs—DiffRelation::Historical(partial foundation for M4/M5 diff-scope)