Skip to content

RFC: Operational Mode Taxonomy — bulk severity baseline, zone/axis targeting, deployment context, and temporal/archival processing #645

@bashandbone

Description

@bashandbone

Summary

The engine rule refactor landed per-rule severity overrides ([rules] E001 = "suggest") as the granular foundation for mode configuration. This closes the question of individual rule tuning. What it does not provide is any of the structural mode concepts that sit above the per-rule layer:

  • A bulk severity baseline (audit-only, validate-only) without enumerating every rule
  • Zone/axis scope targeting (fix metadata but not body, or portions but not banners/CABs)
  • Deployment context (interactive vs. batch/ETL vs. network-boundary audit) that shapes recognizer strategy and audit requirements
  • Temporal/grammar-era processing for historical or archival documents
  • Archival output sub-modes with incompatible intent shapes (update vs. preserve+metadata vs. validate-for-era)

This issue documents the full mode taxonomy, the current gaps, and proposed extension points.


What the Refactor Covers

The rule-refactor work (PR 006 series) delivered:

Mechanism What it provides
[rules] E001 = "suggest" Per-rule severity override
Severity::Suggest Advisory channel: fires, carries fix candidate, never auto-applies
Severity::Off Disables rule entirely (FR-008: no diagnostic emitted)
FixMode::Apply / DryRun Simulate vs. write — same audit stream
Confidence threshold Below threshold → auto-downgraded to Suggest
[closure_rules] severity Same per-rule mechanism for declarative closure rows

These are all point-in-time, per-rule signals. They are the right primitive. The missing layer is composing them into named operational modes that apply across rules, scopes, and contexts without requiring exhaustive per-rule configuration.


M1 — No Bulk Severity Baseline (Audit / Validate Mode)

Gap: There is no [engine] mode = "audit" or equivalent that globally downgrades all Fix-severity rules to Suggest without enumerating them individually. Operators who want an audit-only deployment must:

  1. Know which rules are Fix-severity by default
  2. List every one explicitly in .marque.toml
  3. Maintain that list as rules are added or promoted

Concrete operators who need this:

  • A SIEM integration that wants diagnostic output only, never mutations
  • A CI "check only" job that should fail on violations but never apply fixes
  • A historical audit run where fixes would be incorrect (see M4/M5)

Proposed extension to Config:

[engine]
# Bulk severity cap. Any rule whose default severity exceeds this cap
# is silently capped to the cap value. Per-rule overrides in [rules]
# still win (per-rule is higher precedence than the cap).
#
# Values: "suggest" | "info" | "warn" | "error" | "fix" (default)
severity_cap = "suggest"  # audit-only deployment

Type-level surface (marque-config):

pub struct EngineConfig {
    /// Global severity cap; per-rule overrides in `RuleConfig` still win.
    pub severity_cap: Option<Severity>,
}

The engine applies this at construction time when building fast_path_severities: effective = config_override.unwrap_or(rule_default).min(severity_cap). This is a one-line addition to the existing severity resolution path; the per-rule mechanism is already correct, this just adds one more layer above it.

Relation to CLI: The check subcommand already behaves as if fixes are disabled because it calls Engine::lint, not Engine::fix. The gap is at the config level, where a deployed server or batch job has no way to express "run in lint-only mode" without code changes.


M2 — No Zone / Axis Scope Targeting

Gap: There is no way to express "apply fixes to metadata fields only, not document body" or "fix portion markings, not banners/CABs." The Zone enum (Header, Body, Footer, Cab) exists in marque-scheme and RuleContext.zone carries it, but:

  1. Rules are not annotated with which zones they target
  2. There is no config surface to restrict fix application to a subset of zones
  3. Zone is Option<Zone> on RuleContext and is None in most contexts today (Phase 3 hardcoded Body was removed as "a silent lie"; None is now the honest value)

Concrete operators who need this:

  • "Fix portion markings in the text body, but do not touch the document banner (banner is managed by a separate system)"
  • "Fix only the classification metadata fields in the structured XML; do not rewrite the embedded text payload"

Proposed rule annotation + config filter:

// In marque-rules or marque-scheme:
/// Zones this rule is eligible to fire on.
/// If `None`, fires on all zones (current behavior for all rules).
fn target_zones(&self) -> Option<&'static [Zone]> { None }
[engine]
# Restrict fix application to these zones only.
# Diagnostics are still emitted for all zones; only fix promotion is gated.
fix_zones = ["body"]  # do not auto-apply fixes to header/banner/CAB zones

This is additive. Existing rules returning None from target_zones retain current behavior. The engine's fix-promotion path gates on fix_zones before calling AppliedFix::__engine_promote.


M3 — No Deployment Context

Gap: The engine has no concept of the context in which it is running. This matters because the right defaults differ significantly by context:

Context Preferred recognizer Confidence threshold Audit required Latency budget
Interactive (live typing) StrictRecognizer High No ≤16ms
Batch / ETL pipeline StrictOrDecoderRecognizer Low-medium Yes High throughput
Network boundary inspection StrictOrDecoderRecognizer Medium Required Medium
CLI check (CI) Strict-first Medium Optional Medium
Archival processing Grammar-era-aware High Required Low

Today, recognizer strategy is hard-wired at construction time (Engine::new installs StrictOrDecoderRecognizer; callers use Engine::with_strict_recognizer() for strict-only). There is no config-driven way to say "this is a network-boundary deployment; use the decoder but require an audit log."

Proposed DeploymentContext in marque-config:

[engine]
deployment = "batch"
# Values: "interactive" | "batch" | "boundary" | "archival"
# Each value implies a set of defaults that explicit config overrides.

deployment = "interactive" implies: strict recognizer, high confidence threshold, no mandatory audit.
deployment = "batch" implies: decoder recognizer, medium threshold, audit log to stderr.
deployment = "boundary" implies: decoder recognizer, high threshold, mandatory audit log.
deployment = "archival" implies: grammar-era-locked recognizer (see M5), mandatory audit, no auto-apply.

This is a defaults profile — every individual option (recognizer, threshold, audit) remains independently overridable. DeploymentContext just gives operators a named bundle.

Relationship to BatchEngine: BatchEngine currently expresses parallelism/throughput, not semantic deployment context. These are orthogonal axes; BatchEngine + deployment = "boundary" should be combinable.


M4 — ParseContext::as_of Is Plumbed but Inert

Gap: ParseContext has:

/// Reference date for temporal membership queries (Phase 3 plumbing).
pub as_of: Option<Arc<str>>,

This field is None at every call site in the engine (engine.rs:1181 hardcodes as_of: None). The field has been acknowledged as a stub for the temporal-membership feature (issue #206), but it has zero effect on recognizer behavior, rule dispatch, or the grammar vocabulary loaded.

Additionally, as_of only reaches the recognizer layer. Even once wired, rules and the parser would have no temporal context — a grammar-era-aware rule (e.g., "NODIS was not a valid dissem control before 2012") has no way to read the document's effective date.

What "temporal context" needs to reach:

Layer Why temporal context matters
Recognizer Vocabulary terms valid at as_of date; deprecated terms in that era are not errors
Parser Grammar rules in effect at as_of
Rule dispatch Rules fire only if they were in effect at as_of; a post-2015 rule should not fire on a 2005 document in archival-validate-for-era mode
PageRewrite catalog Rewrites applicable to the grammar era at as_of

Proposed minimal wiring path:

  1. Engine: populate as_of from LintOptions::as_of or from document metadata (extraction layer)
  2. Pass as_of through to rules via RuleContext (currently has no temporal field)
  3. MarkingScheme gains fn vocabulary_at(&self, as_of: Option<&str>) -> &dyn Vocabulary<Self> — returns the vocabulary snapshot for the given date (current behavior when None)

The as_of field design (an Option<Arc<str>> ISO 8601 date) is correct for the stub. The wire-up is the missing work.


M5 — Three Archival Output Modes with Incompatible Intent Shapes

Gap: When processing historical documents, there are three fundamentally different output intents that the type system does not represent:

Mode What the engine should do
Update Rewrite markings to the current grammar; emit AppliedFix records
Preserve + generate metadata Do not rewrite the source text; emit what the markings would be in current grammar as metadata only
Validate-for-era Check the document against the grammar rules in effect when it was written; do not apply current-era corrections

These three modes have incompatible FixIntent shapes and incompatible audit record semantics. Today there is no way to configure any of them. The FixMode::Apply / DryRun binary is insufficient:

  • Apply on an archival document could rewrite 2005-era markings using 2024-era rules → incorrect
  • DryRun simulates but still evaluates current-era rules → same problem
  • Neither produces "what the marking means in current terms" as metadata without rewriting

CAPCO-CONTEXT.md acknowledges this: The archival mode for the NOFORN closure-rule pivot is noted as "planned, not yet wired" (CHK041/CHK043 spec contracts exist but nothing is wired).

Proposed ArchivalIntent enum for LintOptions / FixOptions:

/// Output intent for archival (historical) document processing.
/// Only meaningful when `LintOptions::as_of` / `FixOptions::as_of` is set.
#[non_exhaustive]
pub enum ArchivalIntent {
    /// Rewrite to current-era markings. Default `Engine::fix` behavior.
    Update,
    /// Emit diagnostics only; do not rewrite source. Produces a metadata
    /// record of current-era equivalents. Only available via `Engine::lint`.
    PreserveWithMetadata,
    /// Evaluate against the grammar era at `as_of`. Current-era rules
    /// that post-date `as_of` are suppressed. No rewrites applied.
    ValidateForEra,
}

M6 — Grammar Era Requires More Than a Date String

Gap: ParseContext::as_of is a bare Option<Arc<str>> date string. But a grammar era is not just a date — it may require:

  • A different CVE vocabulary (e.g., pre-2015 CAPCO had different SCI control tokens)
  • Different PageRewrite rows (the NOFORN-if-no-FD&R closure rule changed between schema versions)
  • Different constraint catalog entries
  • A different Vocabulary<S> snapshot (authority strings, owner metadata, deprecated-as-of dates)

A date string cannot carry this; it must be resolved against a grammar version registry. The ism-schema-version metadata pin in crates/ism/Cargo.toml is the build-time pin for the current version, but there is no runtime concept of "load grammar version V for date D."

Proposed GrammarEra type in marque-scheme:

/// Identifies a specific grammar schema version to use for processing.
/// Grammar authors register known versions via `MarkingScheme::era_at(date)`.
pub struct GrammarEra {
    /// Stable schema version label (e.g., "ISM-v2022-DEC").
    pub label: Arc<str>,
    /// ISO 8601 effective date.
    pub effective: Arc<str>,
}

// On MarkingScheme:
fn era_at(&self, as_of: &str) -> Option<GrammarEra>;

This is an additive trait method; CapcoScheme can return None (current behavior) until the historical grammar registry is built. Rules that need to gate on era check RuleContext::grammar_era rather than parsing a raw date string.


Interaction with Related Issues

Issue Interaction
#641 (Grammar coupling — T1-3 Engine not generic) Temporal engine modes require Engine<S> to actually use the passed S. A historical CapcoScheme::for_era(era) must not be silently discarded by Engine::with_clock. This is a blocking dependency.
#643 (InputAdapter protocol) DeploymentContext (M3) and InputAdapter are complementary: structured input gives higher confidence signals; deployment context shapes how those signals propagate. A StructuredField input in a boundary deployment should apply stricter audit requirements than the same input in an interactive deployment.
#176 (ParseContext input-source signal) Overlaps with M3: the input_kind/input_source field proposed in #176 is a per-token confidence modifier. DeploymentContext is a per-engine deployment-wide default. Both are needed; neither subsumes the other.
#206 (Temporal membership / as_of wiring) M4 and M5 directly depend on this issue. as_of must be wired end-to-end (engine → recognizer → rule context → ArchivalIntent gating) before historical grammar support can land.

Severity / Priority

Item Severity Blocking
M1 (bulk severity baseline / audit mode) High Deployments that must never mutate source text have no clean config path today
M2 (zone/axis scope targeting) Medium Addressable by careful per-rule config; ergonomic gap not a correctness gap
M3 (deployment context) Medium BatchEngine covers parallelism; semantic context defaults are a usability gap
M4 (as_of wiring) High Archival processing is acknowledged planned work; the wire-up path is clear
M5 (archival output modes) High Without this, Engine::fix on a historical document is unsafe
M6 (grammar era type) Medium Prerequisite for deep historical grammar support; low urgency until M4 is wired

M1 is the most impactful with the least implementation risk — it is one additional config field and one modification to the severity-resolution path that already exists.


See Also

Metadata

Metadata

Assignees

No one assigned

    Labels

    engineEngine pipeline, scanner/parser, RuleContext/Severity infrastructure, and cross-domain core surfaceenhancementNew feature or requestpost-refactorThing that can wait until after the current big refactor

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions