josix · josix · May 18, 2026 · May 18, 2026 · May 18, 2026
diff --git a/.gitignore b/.gitignore
@@ -1,5 +1,8 @@
 .claude/*.local.*
 .claude/explain-briefs/
+.claude/codex/
+.mcp.json.bak
+*.bak
 graphify-out/
 explain-out/
 .claude/observability/**

diff --git a/AGENTS.md b/AGENTS.md
@@ -0,0 +1,88 @@
+# Codex co-review — repo guide
+
+## Project context
+
+Agent Flow is a multi-agent orchestrator plugin for Claude Code. Agents are markdown files under `agents/`; orchestration commands live in `commands/`; skills under `skills/`. The `/orchestrate` command drives a six-phase pipeline (Explore → Plan → Implement → Review → Verify → Report) where each phase is delegated to a specialist agent (Riko, Senku, Loid, Lawliet, Alphonse).
+
+## Your role: Phase 4 co-reviewer
+
+You run alongside Lawliet in Phase 4 of `/orchestrate` as a co-reviewer. Lawliet handles linter-grounded findings: it runs tsc, mypy, ruff, eslint, and semgrep and reports what those tools catch. Your job is to catch what linters miss:
+
+- Logic flaws and algorithmic errors
+- Intent-vs-implementation mismatches (code does something different from what the task asked)
+- Missing edge cases that tests do not cover
+- Security smells that static analysis does not flag
+- Naming and clarity issues that obscure intent without being formal lint violations
+
+Do NOT duplicate Lawliet's static-analysis work. If tsc or ruff would catch it, do not surface it. Your findings are complementary, not redundant.
+
+## Output contract
+
+Return exactly one verdict line followed by zero or more finding lines.
+
+Verdict line (required, exactly one):
+
+```
+APPROVED
+NEEDS_CHANGES
+BLOCKED
+```
+
+Finding lines (optional, one per line):
+
+```
+<severity>: <file>:<line>: <issue description>
+```
+
+Severity values: `ERROR`, `WARNING`, `INFO`
+
+Rules:
+- Only include a finding if you can cite an exact file and line number.
+- Findings without a file:line citation are advisory — do not include them as findings.
+- If you have advisory observations without a file:line, summarise them briefly after the findings block under the heading "Advisory notes:" but do not let them affect your verdict.
+- Use BLOCKED only when you find a clear bug or security issue with a file:line citation.
+- Use NEEDS_CHANGES for style/correctness issues with a file:line citation.
+- Use APPROVED when no blocking or needs-changes findings exist.
+
+## Severity scale
+
+- `ERROR` → BLOCKED: bug, security issue, broken invariant. Produces wrong behavior or unsafe state.
+- `WARNING` → NEEDS_CHANGES: correctness or style issue that works but should be improved.
+- `INFO` → APPROVED + advisory: nit or suggestion. Never changes the verdict.
+
+## Repo-specific blocker classes
+
+Use this as a concrete checklist when reviewing agent-flow diffs:
+
+1. **Shell safety**: Shell scripts must use `set -euo pipefail` at the top. Flag any new `.sh` file missing it (applies to standalone `.sh` files only; embedded Bash inside `commands/*.md` is out of scope for this rubric). This is an ERROR.
+
+2. **Heredoc variable expansion**: `commands/*.md` Bash blocks must not use `$VAR` inside `<<'PROMPT'` heredocs — single-quoted heredoc delimiters suppress all variable expansion, so the variable reference will be emitted literally instead of substituted. This is silently broken. Flag any `$VARIABLE` inside a `<<'PROMPT'` block as an ERROR.
+
+3. **YAML validity**: YAML emitted to `.claude/orchestration.local.md` must be syntactically valid. Unclosed keys, bad indentation, or stray characters will break downstream grep-based parsers. This is an ERROR.
+
+4. **No hardcoded paths**: Scripts must not contain `/Users/...` or other machine-specific absolute paths. Use `${HOME}`, `$(git rev-parse --show-toplevel)`, or `${CLAUDE_PLUGIN_ROOT}` instead. This is a WARNING.
+
+5. **No secrets**: No API keys, tokens, passwords, or credentials may appear in committed files. This is an ERROR.
+
+## What NOT to flag (defer to Lawliet)
+
+Do not surface the following — Lawliet already covers them:
+
+- Lint-level style nits caught by `tsc`, `mypy`, `ruff`, `eslint`, or `semgrep`.
+- Type-system errors that a type checker would report.
+- Standard naming convention violations covered by configured linters.
+- Import ordering, whitespace, or formatting issues caught by formatters.
+
+Raising these duplicates Lawliet's work and clutters the review with noise.
+
+## Tie-breaker
+
+When uncertain between BLOCKED and NEEDS_CHANGES:
+
+- Use **BLOCKED** only when the issue will produce wrong behavior or an unsafe state if the code is merged as-is.
+- Use **NEEDS_CHANGES** when the code works but has a correctness or clarity problem that should be fixed before merging.
+
+When uncertain between NEEDS_CHANGES and APPROVED (INFO):
+
+- Use **NEEDS_CHANGES** only when the finding has a file:line citation and represents a real improvement, not a preference.
+- Use **APPROVED** with an advisory note for everything else.
diff --git a/agents/Lawliet.md b/agents/Lawliet.md
@@ -49,7 +49,7 @@ Lawliet performs **static analysis only**: type checking, linting, security scan
    - Code quality: `sonarqube`, `coderabbit` (if available)
 6. Check against requirements
 7. Verify patterns are followed (cross-reference graph-surfaced siblings from step 4)
-8. Look for edge cases (include callers from step 3's blast-radius output)
+8. Look for structural edge cases surfaced by callers from step 3's blast-radius output (untested call sites, divergent error paths, missing null-guards). Do not duplicate Codex's logic-level edge-case analysis — that lives in Phase 4 co-review per AGENTS.md.
 9. Analyze security issues
 
 **Allowed Bash Commands (Static Analysis Only):**
@@ -94,7 +94,7 @@ node app.js      # Running code is forbidden
 - [Improvement suggestions]
 
 ### Verdict
-[APPROVED | NEEDS_CHANGES | BLOCKED]
+[APPROVED | NEEDS_CHANGES]
 
 ## Self-Reflection Protocol
 

diff --git a/commands/orchestrate.md b/commands/orchestrate.md
@@ -227,7 +227,69 @@ Proceed only when Loid confirms changes are implemented.
 - Check for security issues
 - Verify adherence to patterns
 
-After Lawliet completes:
+After Lawliet completes, record its verdict (`APPROVED` or `NEEDS_CHANGES`).
+
+#### Codex co-review (optional)
+
+Check whether Codex is available:
+
+```bash
+CODEX_AVAILABLE=$(grep -A1 '^codex:' .claude/orchestration.local.md | grep 'available:' | sed 's/.*available: *//')
+```
+
+**When `CODEX_AVAILABLE` is `true`**, run Codex as a co-reviewer via Bash (NOT a subagent dispatch — Codex is an external CLI):
+
+Before dispatching Codex, the orchestrator MUST persist Lawliet's findings to a fixed well-known path so the Codex dispatch can include them. Lawliet's full markdown response lives in the orchestrator's conversation memory — use the Write tool to write Lawliet's full markdown response verbatim to `.claude/codex/lawliet-findings.tmp.md` before running the dispatch block below. Create the directory if needed: `mkdir -p .claude/codex`.
+
+Then dispatch Codex via the shared helper:
+
+```bash
+LAWLIET_FINDINGS_FILE=".claude/codex/lawliet-findings.tmp.md"
+CODEX_RESULT=$(bash ${CLAUDE_PLUGIN_ROOT}/scripts/dispatch-codex-review.sh \
+  --state-file .claude/orchestration.local.md \
+  --lawliet-findings "$LAWLIET_FINDINGS_FILE")
+CODEX_RAN=$(echo "$CODEX_RESULT" | grep '^codex_ran:' | sed 's/.*: *//')
+CODEX_VERDICT=$(echo "$CODEX_RESULT" | grep '^codex_verdict:' | sed 's/.*: *//')
+CODEX_RAW_PATH=$(echo "$CODEX_RESULT" | grep '^codex_raw_path:' | sed 's/.*: *//')
+CODEX_RAW=""
+if [[ -n "$CODEX_RAW_PATH" && -f "$CODEX_RAW_PATH" ]]; then
+  CODEX_RAW=$(cat "$CODEX_RAW_PATH")
+  rm -f "$CODEX_RAW_PATH"
+fi
+rm -f "$LAWLIET_FINDINGS_FILE"
+```
+
+The output contract and severity scale are defined in `AGENTS.md` at the repo root, which Codex auto-loads on every invocation.
+
+If the shared helper (`scripts/dispatch-codex-review.sh`) detects that `codex exec` exited non-zero (timeout, auth failure, network), Phase 4 falls back to Lawliet-only — the helper exits 0 but emits `codex_verdict: ADVISORY` so the orchestrator can detect the degraded state. The final verdict is whatever Lawliet emitted.
+
+The helper builds the diff, task description, and Lawliet's findings internally. `$CODEX_RAW` contains Codex's full reply (as written by `--output-last-message`): the first non-blank line is the verdict (`APPROVED` / `NEEDS_CHANGES` / `BLOCKED`); subsequent lines of the form `<severity>: <file>:<line>: <issue>` are findings. Findings without a `file:line` token are advisory only and cannot trigger a NEEDS_CHANGES verdict. If the first non-blank line is not one of `APPROVED`, `NEEDS_CHANGES`, or `BLOCKED`, treat the entire Codex output as advisory and log `warn: Codex verdict unparseable — treating as advisory`.
+
+**Findings without a `file:line` citation are advisory only** — they do not affect the final verdict and Loid is NOT routed back for them.
+
+**Disagreement rule (truth table):**
+
+Note: Lawliet emits only `APPROVED` or `NEEDS_CHANGES`. `BLOCKED` is a Codex-only verdict (used when Codex finds a severity-blocker with a `file:line` cite).
+
+| Lawliet verdict | Codex verdict | Codex has file:line citation? | Final Phase 4 verdict |
+|-----------------|---------------|-------------------------------|-----------------------|
+| APPROVED | APPROVED | n/a | APPROVED |
+| APPROVED | BLOCKED | yes | NEEDS_CHANGES (surface Codex cite) |
+| APPROVED | BLOCKED | no | APPROVED (advisory only) |
+| APPROVED | NEEDS_CHANGES | yes | NEEDS_CHANGES (surface Codex cite) |
+| APPROVED | NEEDS_CHANGES | no | APPROVED (advisory only) |
+| NEEDS_CHANGES | APPROVED | n/a | NEEDS_CHANGES (Lawliet wins on linter-grounded findings) |
+| NEEDS_CHANGES | BLOCKED or NEEDS_CHANGES | any | NEEDS_CHANGES |
+
+When the final verdict is NEEDS_CHANGES, delegate back to Loid with specific issues from Lawliet and/or Codex (file:line citations required).
+
+**When `CODEX_AVAILABLE` is `false`**, skip the Codex co-review entirely. Phase 4 behaves identically to today (Lawliet-only). Log one info line:
+
+```
+info: Codex co-review skipped (codex.available: false)
+```
+
+After computing the final Phase 4 verdict:
 - If APPROVED: Update state and proceed
 - If NEEDS_CHANGES: Delegate back to Loid with specific issues
 

diff --git a/commands/team-orchestrate.md b/commands/team-orchestrate.md
@@ -391,16 +391,66 @@ SendMessage(
 )
 ```
 
+**Step 7.5: Codex co-review (after Lawliet collection, before state writes)**
+
+Lawliet's reply from Step 7 is now in your conversation memory. If Codex is
+available, dispatch it as a sequential co-reviewer before recording the review
+verdict — Codex's findings may flip Lawliet's verdict per the truth table in
+`commands/orchestrate.md` Phase 4.
+
+```bash
+CODEX_AVAILABLE=$(grep -A1 '^codex:' .claude/team-orchestration.local.md | grep 'available:' | sed 's/.*available: *//')
+```
+
+When `CODEX_AVAILABLE` is `true`:
+
+1. Use the Write tool to persist Lawliet's full reply (from Step 7's SendMessage)
+   verbatim to `.claude/codex/lawliet-findings.tmp.md`. Create the directory first:
+   `mkdir -p .claude/codex`.
+
+2. Dispatch Codex via the shared helper:
+
+   ```bash
+   LAWLIET_FINDINGS_FILE=".claude/codex/lawliet-findings.tmp.md"
+   CODEX_RESULT=$(bash ${CLAUDE_PLUGIN_ROOT}/scripts/dispatch-codex-review.sh \
+     --state-file .claude/team-orchestration.local.md \
+     --lawliet-findings "$LAWLIET_FINDINGS_FILE")
+   CODEX_VERDICT=$(echo "$CODEX_RESULT" | grep '^codex_verdict:' | sed 's/.*: *//')
+   CODEX_RAW_PATH=$(echo "$CODEX_RESULT" | grep '^codex_raw_path:' | sed 's/.*: *//')
+   CODEX_RAW=""
+   if [[ -n "$CODEX_RAW_PATH" && -f "$CODEX_RAW_PATH" ]]; then
+     CODEX_RAW=$(cat "$CODEX_RAW_PATH")
+     rm -f "$CODEX_RAW_PATH"
+   fi
+   rm -f "$LAWLIET_FINDINGS_FILE"
+   ```
+
+3. Apply the truth table from `commands/orchestrate.md` Phase 4 to reconcile
+   Lawliet + Codex into `FINAL_REVIEW_VERDICT` (one of APPROVED/NEEDS_CHANGES).
+   The truth table is mode-agnostic.
+
+4. The reconciled verdict feeds Step 8's `--gate-result` decision.
+
+When `CODEX_AVAILABLE` is `false`:
+```
+info: Codex co-review skipped (codex.available: false)
+```
+`FINAL_REVIEW_VERDICT` = Lawliet's verdict; Phase 4 behavior is unchanged.
+
 **Step 8: Update Teammate Statuses**
 
 Update state for each teammate result:
 
 ```bash
-# Update review status
+# Update review status (using reconciled verdict from Step 7.5)
+REVIEW_GATE_RESULT="passed"
+if [[ "$FINAL_REVIEW_VERDICT" == "NEEDS_CHANGES" ]]; then
+  REVIEW_GATE_RESULT="failed"
+fi
 bash ${CLAUDE_PLUGIN_ROOT}/scripts/update-team-state.sh \
   --parallel-group review_verification --teammate review \
-  --gate-result passed --agent Lawliet \
-  --message "Code review passed"
+  --gate-result "$REVIEW_GATE_RESULT" --agent Lawliet \
+  --message "Code review: $FINAL_REVIEW_VERDICT"
 
 # Update verification status
 bash ${CLAUDE_PLUGIN_ROOT}/scripts/update-team-state.sh \
@@ -437,13 +487,20 @@ echo "Parallel Group Results: $SUMMARY"
   FAILED_PHASES=$(echo "$MERGE_RESULT" | grep -o '"failed_phases": *\[[^]]*\]')
   ```
 
+  When `FINAL_REVIEW_VERDICT` is `NEEDS_CHANGES`, include BOTH Lawliet's findings
+  AND Codex's `$CODEX_RAW` (captured in Step 7.5) in the delegation prompt.
+  Otherwise Loid receives only Lawliet's findings and may not address the Codex-flagged issues, leaving the gate stuck.
+
   Delegate to Loid with specific issues:
   ```
   Task(agent="Loid", prompt="
   Fix the issues found in parallel review/verification:
 
   Review Issues (if any):
-  [Include Lawliet's feedback]
+  [Include Lawliet's feedback verbatim]
+
+  Codex Co-Review Findings (if Codex contributed to the failed verdict):
+  [Include Codex's $CODEX_RAW verbatim — file:line citations from Step 7.5 are essential here so Loid knows what to fix]
 
   Verification Issues (if any):
   [Include Alphonse's failures]
@@ -464,14 +521,54 @@ echo "Parallel Group Results: $SUMMARY"
 - Check for security issues
 - Verify adherence to patterns
 
-After Lawliet completes:
-- If APPROVED: Update state and proceed
-- If NEEDS_CHANGES: Delegate back to Loid with specific issues
+After Lawliet completes, do NOT record the gate result yet — proceed to
+the **Codex co-review** block below. The final review verdict
+(`FINAL_REVIEW_VERDICT`) is computed AFTER Codex runs (or after Codex is
+skipped due to `codex.available: false`), per the truth table at
+`commands/orchestrate.md` Phase 4.
+
+**Codex co-review (sequential fallback)**
+
+After Lawliet completes, if Codex is available, run Codex co-review before
+recording the review gate result.
 
 ```bash
-bash ${CLAUDE_PLUGIN_ROOT}/scripts/update-team-state.sh \
-  --phase verification --gate-result passed --agent Lawliet \
-  --message "Code review passed"
+CODEX_AVAILABLE=$(grep -A1 '^codex:' .claude/team-orchestration.local.md | grep 'available:' | sed 's/.*available: *//')
+```
+
+When `CODEX_AVAILABLE` is `true`, use the Write tool to persist Lawliet's full
+reply to `.claude/codex/lawliet-findings.tmp.md` (`mkdir -p .claude/codex`
+first), then:
+
+```bash
+LAWLIET_FINDINGS_FILE=".claude/codex/lawliet-findings.tmp.md"
+CODEX_RESULT=$(bash ${CLAUDE_PLUGIN_ROOT}/scripts/dispatch-codex-review.sh \
+  --state-file .claude/team-orchestration.local.md \
+  --lawliet-findings "$LAWLIET_FINDINGS_FILE")
+CODEX_VERDICT=$(echo "$CODEX_RESULT" | grep '^codex_verdict:' | sed 's/.*: *//')
+CODEX_RAW_PATH=$(echo "$CODEX_RESULT" | grep '^codex_raw_path:' | sed 's/.*: *//')
+CODEX_RAW=""
+if [[ -n "$CODEX_RAW_PATH" && -f "$CODEX_RAW_PATH" ]]; then
+  CODEX_RAW=$(cat "$CODEX_RAW_PATH")
+  rm -f "$CODEX_RAW_PATH"
+fi
+rm -f "$LAWLIET_FINDINGS_FILE"
+```
+
+Apply the truth table from `commands/orchestrate.md` Phase 4 to reconcile into
+`FINAL_REVIEW_VERDICT`. If NEEDS_CHANGES, delegate back to Loid with combined
+citations and loop — do NOT proceed to verification yet.
+
+When `CODEX_AVAILABLE` is `false`, log
+`info: Codex co-review skipped (codex.available: false)` and use Lawliet's
+verdict as final.
+
+```bash
+if [[ "$FINAL_REVIEW_VERDICT" == "APPROVED" ]]; then
+  bash ${CLAUDE_PLUGIN_ROOT}/scripts/update-team-state.sh \
+    --phase verification --gate-result passed --agent Lawliet \
+    --message "Code review passed (Codex verdict: $CODEX_VERDICT)"
+fi
 ```
 
 **Phase 5: Verification (Sequential)**