Summary
Add a lightweight guard (a test and/or a custom ruff/grep check) that protects the
hard treatment-vs-policy vocabulary separation documented as an invariant, flagging
risky variable/parameter naming in the estimator core that could invert results.
Why this matters
The invariants doc states that confusing "treatment" (logged action) with "policy"
(model recommendation) "will invert or completely invalidate evaluation results".
This is a correctness landmine for any contributor or AI agent. A small automated
guard makes the most dangerous semantic invariant visible at edit time, protecting
a deliberately-chosen design.
Current evidence
docs/agent-context/invariants.md "Domain Vocabulary (Hard Separation)" and
AGENTS.md §3 both document the treatment/policy separation.
- The estimator core (
core.py) and the importance weight w = π/e rely on this
distinction (_dr_weight_components, dr_value_with_clip).
External context
Not required for this issue.
Proposed implementation
- Encode a check (test or pre-commit hook) that flags suspicious renames/usages
conflating the two concepts in the estimator core (heuristic, advisory).
- Add a focused test documenting the expected directionality of the importance
weight (policy in numerator, logged propensity in denominator).
- Cross-link the guard from the invariants doc.
AI-agent execution notes
- Inspect first:
core.py (_dr_weight_components, dr_value_with_clip),
docs/agent-context/invariants.md, AGENTS.md §3, .pre-commit-config.yaml.
- Keep any naming heuristic advisory (avoid false-positive churn); the
directionality test is the firm guarantee.
- Do not change estimator behaviour.
Acceptance criteria
- A test asserts the importance-weight directionality (policy/propensity).
- An advisory guard (or documented convention check) for the vocabulary exists.
- The invariants doc links to the guard.
Test plan
- Directionality test on a controlled example where swapping numerator/denominator
is detectable; make validate.
Documentation plan
- Link from invariants doc; CHANGELOG
### Testing.
Migration and compatibility notes
Not expected to require migration.
Risks and tradeoffs
A naming heuristic can be noisy; keep it advisory and rely on the directionality
test for the hard guarantee.
Suggested labels
testing, architecture, reliability
Summary
Add a lightweight guard (a test and/or a custom ruff/grep check) that protects the
hard treatment-vs-policy vocabulary separation documented as an invariant, flagging
risky variable/parameter naming in the estimator core that could invert results.
Why this matters
The invariants doc states that confusing "treatment" (logged action) with "policy"
(model recommendation) "will invert or completely invalidate evaluation results".
This is a correctness landmine for any contributor or AI agent. A small automated
guard makes the most dangerous semantic invariant visible at edit time, protecting
a deliberately-chosen design.
Current evidence
docs/agent-context/invariants.md"Domain Vocabulary (Hard Separation)" andAGENTS.md§3 both document the treatment/policy separation.core.py) and the importance weightw = π/erely on thisdistinction (
_dr_weight_components,dr_value_with_clip).External context
Not required for this issue.
Proposed implementation
conflating the two concepts in the estimator core (heuristic, advisory).
weight (policy in numerator, logged propensity in denominator).
AI-agent execution notes
core.py(_dr_weight_components,dr_value_with_clip),docs/agent-context/invariants.md,AGENTS.md§3,.pre-commit-config.yaml.directionality test is the firm guarantee.
Acceptance criteria
Test plan
is detectable;
make validate.Documentation plan
### Testing.Migration and compatibility notes
Not expected to require migration.
Risks and tradeoffs
A naming heuristic can be noisy; keep it advisory and rely on the directionality
test for the hard guarantee.
Suggested labels
testing, architecture, reliability