RFC: Operational Mode Taxonomy — bulk severity baseline, zone/axis targeting, deployment context, and temporal/archival processing

## Summary

The engine rule refactor landed per-rule severity overrides (`[rules] E001 = "suggest"`) as the granular foundation for mode configuration. This closes the question of *individual rule tuning*. What it does not provide is any of the *structural mode concepts* that sit above the per-rule layer:

- A **bulk severity baseline** (audit-only, validate-only) without enumerating every rule
- **Zone/axis scope targeting** (fix metadata but not body, or portions but not banners/CABs)
- **Deployment context** (interactive vs. batch/ETL vs. network-boundary audit) that shapes recognizer strategy and audit requirements
- **Temporal/grammar-era processing** for historical or archival documents
- **Archival output sub-modes** with incompatible intent shapes (update vs. preserve+metadata vs. validate-for-era)

This issue documents the full mode taxonomy, the current gaps, and proposed extension points.

---

## What the Refactor Covers

The rule-refactor work (PR 006 series) delivered:

| Mechanism | What it provides |
|-----------|-----------------|
| `[rules] E001 = "suggest"` | Per-rule severity override |
| `Severity::Suggest` | Advisory channel: fires, carries fix candidate, never auto-applies |
| `Severity::Off` | Disables rule entirely (FR-008: no diagnostic emitted) |
| `FixMode::Apply` / `DryRun` | Simulate vs. write — same audit stream |
| Confidence threshold | Below threshold → auto-downgraded to `Suggest` |
| `[closure_rules]` severity | Same per-rule mechanism for declarative closure rows |

These are all point-in-time, per-rule signals. They are the right primitive. The missing layer is *composing them into named operational modes* that apply across rules, scopes, and contexts without requiring exhaustive per-rule configuration.

---

## M1 — No Bulk Severity Baseline (Audit / Validate Mode)

**Gap:** There is no `[engine] mode = "audit"` or equivalent that globally downgrades all `Fix`-severity rules to `Suggest` without enumerating them individually. Operators who want an audit-only deployment must:

1. Know which rules are `Fix`-severity by default
2. List every one explicitly in `.marque.toml`
3. Maintain that list as rules are added or promoted

**Concrete operators who need this:**
- A SIEM integration that wants diagnostic output only, never mutations
- A CI "check only" job that should fail on violations but never apply fixes
- A historical audit run where fixes would be incorrect (see M4/M5)

**Proposed extension to `Config`:**

```toml
[engine]
# Bulk severity cap. Any rule whose default severity exceeds this cap
# is silently capped to the cap value. Per-rule overrides in [rules]
# still win (per-rule is higher precedence than the cap).
#
# Values: "suggest" | "info" | "warn" | "error" | "fix" (default)
severity_cap = "suggest"  # audit-only deployment
```

**Type-level surface (`marque-config`):**

```rust
pub struct EngineConfig {
    /// Global severity cap; per-rule overrides in `RuleConfig` still win.
    pub severity_cap: Option<Severity>,
}
```

The engine applies this at construction time when building `fast_path_severities`: `effective = config_override.unwrap_or(rule_default).min(severity_cap)`. This is a one-line addition to the existing severity resolution path; the per-rule mechanism is already correct, this just adds one more layer above it.

**Relation to CLI:** The `check` subcommand already behaves as if fixes are disabled because it calls `Engine::lint`, not `Engine::fix`. The gap is at the *config* level, where a deployed server or batch job has no way to express "run in lint-only mode" without code changes.

---

## M2 — No Zone / Axis Scope Targeting

**Gap:** There is no way to express "apply fixes to metadata fields only, not document body" or "fix portion markings, not banners/CABs." The `Zone` enum (`Header`, `Body`, `Footer`, `Cab`) exists in `marque-scheme` and `RuleContext.zone` carries it, but:

1. Rules are not annotated with which zones they target
2. There is no config surface to restrict fix application to a subset of zones
3. Zone is `Option<Zone>` on `RuleContext` and is `None` in most contexts today (Phase 3 hardcoded `Body` was removed as "a silent lie"; `None` is now the honest value)

**Concrete operators who need this:**
- "Fix portion markings in the text body, but do not touch the document banner (banner is managed by a separate system)"
- "Fix only the classification metadata fields in the structured XML; do not rewrite the embedded text payload"

**Proposed rule annotation + config filter:**

```rust
// In marque-rules or marque-scheme:
/// Zones this rule is eligible to fire on.
/// If `None`, fires on all zones (current behavior for all rules).
fn target_zones(&self) -> Option<&'static [Zone]> { None }
```

```toml
[engine]
# Restrict fix application to these zones only.
# Diagnostics are still emitted for all zones; only fix promotion is gated.
fix_zones = ["body"]  # do not auto-apply fixes to header/banner/CAB zones
```

This is additive. Existing rules returning `None` from `target_zones` retain current behavior. The engine's fix-promotion path gates on `fix_zones` before calling `AppliedFix::__engine_promote`.

---

## M3 — No Deployment Context

**Gap:** The engine has no concept of the *context in which it is running*. This matters because the right defaults differ significantly by context:

| Context | Preferred recognizer | Confidence threshold | Audit required | Latency budget |
|---------|---------------------|----------------------|----------------|----------------|
| Interactive (live typing) | `StrictRecognizer` | High | No | ≤16ms |
| Batch / ETL pipeline | `StrictOrDecoderRecognizer` | Low-medium | Yes | High throughput |
| Network boundary inspection | `StrictOrDecoderRecognizer` | Medium | Required | Medium |
| CLI `check` (CI) | Strict-first | Medium | Optional | Medium |
| Archival processing | Grammar-era-aware | High | Required | Low |

Today, recognizer strategy is hard-wired at construction time (`Engine::new` installs `StrictOrDecoderRecognizer`; callers use `Engine::with_strict_recognizer()` for strict-only). There is no config-driven way to say "this is a network-boundary deployment; use the decoder but require an audit log."

**Proposed `DeploymentContext` in `marque-config`:**

```toml
[engine]
deployment = "batch"
# Values: "interactive" | "batch" | "boundary" | "archival"
# Each value implies a set of defaults that explicit config overrides.
```

`deployment = "interactive"` implies: strict recognizer, high confidence threshold, no mandatory audit.  
`deployment = "batch"` implies: decoder recognizer, medium threshold, audit log to stderr.  
`deployment = "boundary"` implies: decoder recognizer, high threshold, mandatory audit log.  
`deployment = "archival"` implies: grammar-era-locked recognizer (see M5), mandatory audit, no auto-apply.

This is a *defaults profile* — every individual option (recognizer, threshold, audit) remains independently overridable. `DeploymentContext` just gives operators a named bundle.

**Relationship to `BatchEngine`:** `BatchEngine` currently expresses parallelism/throughput, not semantic deployment context. These are orthogonal axes; `BatchEngine` + `deployment = "boundary"` should be combinable.

---

## M4 — `ParseContext::as_of` Is Plumbed but Inert

**Gap:** `ParseContext` has:

```rust
/// Reference date for temporal membership queries (Phase 3 plumbing).
pub as_of: Option<Arc<str>>,
```

This field is `None` at every call site in the engine (`engine.rs:1181` hardcodes `as_of: None`). The field has been acknowledged as a stub for the temporal-membership feature (issue #206), but it has zero effect on recognizer behavior, rule dispatch, or the grammar vocabulary loaded.

Additionally, `as_of` only reaches the *recognizer layer*. Even once wired, rules and the parser would have no temporal context — a grammar-era-aware rule (e.g., "NODIS was not a valid dissem control before 2012") has no way to read the document's effective date.

**What "temporal context" needs to reach:**

| Layer | Why temporal context matters |
|-------|------------------------------|
| Recognizer | Vocabulary terms valid at `as_of` date; deprecated terms in that era are not errors |
| Parser | Grammar rules in effect at `as_of` |
| Rule dispatch | Rules fire only if they were in effect at `as_of`; a post-2015 rule should not fire on a 2005 document in archival-validate-for-era mode |
| `PageRewrite` catalog | Rewrites applicable to the grammar era at `as_of` |

**Proposed minimal wiring path:**

1. Engine: populate `as_of` from `LintOptions::as_of` or from document metadata (extraction layer)
2. Pass `as_of` through to rules via `RuleContext` (currently has no temporal field)
3. `MarkingScheme` gains `fn vocabulary_at(&self, as_of: Option<&str>) -> &dyn Vocabulary<Self>` — returns the vocabulary snapshot for the given date (current behavior when `None`)

The `as_of` field design (an `Option<Arc<str>>` ISO 8601 date) is correct for the stub. The wire-up is the missing work.

---

## M5 — Three Archival Output Modes with Incompatible Intent Shapes

**Gap:** When processing historical documents, there are three fundamentally different *output intents* that the type system does not represent:

| Mode | What the engine should do |
|------|--------------------------|
| **Update** | Rewrite markings to the current grammar; emit `AppliedFix` records |
| **Preserve + generate metadata** | Do not rewrite the source text; emit what the markings *would* be in current grammar as metadata only |
| **Validate-for-era** | Check the document against the grammar rules *in effect when it was written*; do not apply current-era corrections |

These three modes have incompatible `FixIntent` shapes and incompatible audit record semantics. Today there is no way to configure any of them. The `FixMode::Apply` / `DryRun` binary is insufficient:
- `Apply` on an archival document could rewrite 2005-era markings using 2024-era rules → incorrect
- `DryRun` simulates but still evaluates current-era rules → same problem
- Neither produces "what the marking means in current terms" as metadata without rewriting

**CAPCO-CONTEXT.md acknowledges this:** The archival mode for the NOFORN closure-rule pivot is noted as "planned, not yet wired" (CHK041/CHK043 spec contracts exist but nothing is wired).

**Proposed `ArchivalIntent` enum for `LintOptions` / `FixOptions`:**

```rust
/// Output intent for archival (historical) document processing.
/// Only meaningful when `LintOptions::as_of` / `FixOptions::as_of` is set.
#[non_exhaustive]
pub enum ArchivalIntent {
    /// Rewrite to current-era markings. Default `Engine::fix` behavior.
    Update,
    /// Emit diagnostics only; do not rewrite source. Produces a metadata
    /// record of current-era equivalents. Only available via `Engine::lint`.
    PreserveWithMetadata,
    /// Evaluate against the grammar era at `as_of`. Current-era rules
    /// that post-date `as_of` are suppressed. No rewrites applied.
    ValidateForEra,
}
```

---

## M6 — Grammar Era Requires More Than a Date String

**Gap:** `ParseContext::as_of` is a bare `Option<Arc<str>>` date string. But a grammar era is not just a date — it may require:

- A different CVE vocabulary (e.g., pre-2015 CAPCO had different SCI control tokens)
- Different `PageRewrite` rows (the NOFORN-if-no-FD&R closure rule changed between schema versions)
- Different constraint catalog entries
- A different `Vocabulary<S>` snapshot (authority strings, owner metadata, deprecated-as-of dates)

A date string cannot carry this; it must be resolved against a *grammar version registry*. The `ism-schema-version` metadata pin in `crates/ism/Cargo.toml` is the build-time pin for the current version, but there is no runtime concept of "load grammar version V for date D."

**Proposed `GrammarEra` type in `marque-scheme`:**

```rust
/// Identifies a specific grammar schema version to use for processing.
/// Grammar authors register known versions via `MarkingScheme::era_at(date)`.
pub struct GrammarEra {
    /// Stable schema version label (e.g., "ISM-v2022-DEC").
    pub label: Arc<str>,
    /// ISO 8601 effective date.
    pub effective: Arc<str>,
}

// On MarkingScheme:
fn era_at(&self, as_of: &str) -> Option<GrammarEra>;
```

This is an additive trait method; `CapcoScheme` can return `None` (current behavior) until the historical grammar registry is built. Rules that need to gate on era check `RuleContext::grammar_era` rather than parsing a raw date string.

---

## Interaction with Related Issues

| Issue | Interaction |
|-------|-------------|
| #641 (Grammar coupling — T1-3 Engine not generic) | Temporal engine modes require `Engine<S>` to actually use the passed `S`. A historical `CapcoScheme::for_era(era)` must not be silently discarded by `Engine::with_clock`. This is a blocking dependency. |
| #643 (InputAdapter protocol) | `DeploymentContext` (M3) and `InputAdapter` are complementary: structured input gives higher confidence signals; deployment context shapes how those signals propagate. A `StructuredField` input in a `boundary` deployment should apply stricter audit requirements than the same input in an `interactive` deployment. |
| #176 (ParseContext input-source signal) | Overlaps with M3: the `input_kind`/`input_source` field proposed in #176 is a per-token confidence modifier. `DeploymentContext` is a per-engine deployment-wide default. Both are needed; neither subsumes the other. |
| #206 (Temporal membership / `as_of` wiring) | M4 and M5 directly depend on this issue. `as_of` must be wired end-to-end (engine → recognizer → rule context → `ArchivalIntent` gating) before historical grammar support can land. |

---

## Severity / Priority

| Item | Severity | Blocking |
|------|----------|---------|
| M1 (bulk severity baseline / audit mode) | High | Deployments that must never mutate source text have no clean config path today |
| M2 (zone/axis scope targeting) | Medium | Addressable by careful per-rule config; ergonomic gap not a correctness gap |
| M3 (deployment context) | Medium | `BatchEngine` covers parallelism; semantic context defaults are a usability gap |
| M4 (`as_of` wiring) | High | Archival processing is acknowledged planned work; the wire-up path is clear |
| M5 (archival output modes) | High | Without this, `Engine::fix` on a historical document is unsafe |
| M6 (grammar era type) | Medium | Prerequisite for deep historical grammar support; low urgency until M4 is wired |

M1 is the most impactful with the least implementation risk — it is one additional config field and one modification to the severity-resolution path that already exists.

---

## See Also

- #641 — Grammar coupling taxonomy (T1-3 silent scheme drop blocks M5)
- #643 — InputAdapter protocol (structured input confidence interacts with M3 deployment context)
- #176 — ParseContext input-source signal (per-token confidence; orthogonal to but complementary with M3)
- #206 — Temporal membership / `as_of` (direct prerequisite for M4/M5)
- `crates/capco/CAPCO-CONTEXT.md` — archival mode acknowledgment (CHK041/CHK043 spec contracts)
- `crates/scheme/src/severity.rs` — `Severity` total ordering (the `min(severity_cap)` proposal in M1 uses this)
- `crates/scheme/src/scope.rs` — `DiffRelation::Historical` (partial foundation for M4/M5 diff-scope)

Mode	What the engine should do
Update	Rewrite markings to the current grammar; emit `AppliedFix` records
Preserve + generate metadata	Do not rewrite the source text; emit what the markings would be in current grammar as metadata only
Validate-for-era	Check the document against the grammar rules in effect when it was written; do not apply current-era corrections

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

RFC: Operational Mode Taxonomy — bulk severity baseline, zone/axis targeting, deployment context, and temporal/archival processing #645

Summary

What the Refactor Covers

M1 — No Bulk Severity Baseline (Audit / Validate Mode)

M2 — No Zone / Axis Scope Targeting

M3 — No Deployment Context

M4 — `ParseContext::as_of` Is Plumbed but Inert

M5 — Three Archival Output Modes with Incompatible Intent Shapes

M6 — Grammar Era Requires More Than a Date String

Interaction with Related Issues

Severity / Priority

See Also

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Mechanism	What it provides
`[rules] E001 = "suggest"`	Per-rule severity override
`Severity::Suggest`	Advisory channel: fires, carries fix candidate, never auto-applies
`Severity::Off`	Disables rule entirely (FR-008: no diagnostic emitted)
`FixMode::Apply` / `DryRun`	Simulate vs. write — same audit stream
Confidence threshold	Below threshold → auto-downgraded to `Suggest`
`[closure_rules]` severity	Same per-rule mechanism for declarative closure rows

Context	Preferred recognizer	Confidence threshold	Audit required	Latency budget
Interactive (live typing)	`StrictRecognizer`	High	No	≤16ms
Batch / ETL pipeline	`StrictOrDecoderRecognizer`	Low-medium	Yes	High throughput
Network boundary inspection	`StrictOrDecoderRecognizer`	Medium	Required	Medium
CLI `check` (CI)	Strict-first	Medium	Optional	Medium
Archival processing	Grammar-era-aware	High	Required	Low

Layer	Why temporal context matters
Recognizer	Vocabulary terms valid at `as_of` date; deprecated terms in that era are not errors
Parser	Grammar rules in effect at `as_of`
Rule dispatch	Rules fire only if they were in effect at `as_of`; a post-2015 rule should not fire on a 2005 document in archival-validate-for-era mode
`PageRewrite` catalog	Rewrites applicable to the grammar era at `as_of`

Issue	Interaction
#641 (Grammar coupling — T1-3 Engine not generic)	Temporal engine modes require `Engine<S>` to actually use the passed `S`. A historical `CapcoScheme::for_era(era)` must not be silently discarded by `Engine::with_clock`. This is a blocking dependency.
#643 (InputAdapter protocol)	`DeploymentContext` (M3) and `InputAdapter` are complementary: structured input gives higher confidence signals; deployment context shapes how those signals propagate. A `StructuredField` input in a `boundary` deployment should apply stricter audit requirements than the same input in an `interactive` deployment.
#176 (ParseContext input-source signal)	Overlaps with M3: the `input_kind`/`input_source` field proposed in #176 is a per-token confidence modifier. `DeploymentContext` is a per-engine deployment-wide default. Both are needed; neither subsumes the other.
#206 (Temporal membership / `as_of` wiring)	M4 and M5 directly depend on this issue. `as_of` must be wired end-to-end (engine → recognizer → rule context → `ArchivalIntent` gating) before historical grammar support can land.

Item	Severity	Blocking
M1 (bulk severity baseline / audit mode)	High	Deployments that must never mutate source text have no clean config path today
M2 (zone/axis scope targeting)	Medium	Addressable by careful per-rule config; ergonomic gap not a correctness gap
M3 (deployment context)	Medium	`BatchEngine` covers parallelism; semantic context defaults are a usability gap
M4 (`as_of` wiring)	High	Archival processing is acknowledged planned work; the wire-up path is clear
M5 (archival output modes)	High	Without this, `Engine::fix` on a historical document is unsafe
M6 (grammar era type)	Medium	Prerequisite for deep historical grammar support; low urgency until M4 is wired

Uh oh!

RFC: Operational Mode Taxonomy — bulk severity baseline, zone/axis targeting, deployment context, and temporal/archival processing #645

Description

Summary

What the Refactor Covers

M1 — No Bulk Severity Baseline (Audit / Validate Mode)

M2 — No Zone / Axis Scope Targeting

M3 — No Deployment Context

M4 — ParseContext::as_of Is Plumbed but Inert

M5 — Three Archival Output Modes with Incompatible Intent Shapes

M6 — Grammar Era Requires More Than a Date String

Interaction with Related Issues

Severity / Priority

See Also

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

M4 — `ParseContext::as_of` Is Plumbed but Inert