Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
.claude/*.local.*
.claude/explain-briefs/
.claude/codex/
.mcp.json.bak
*.bak
graphify-out/
explain-out/
.claude/observability/**
Expand Down
88 changes: 88 additions & 0 deletions AGENTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
# Codex co-review — repo guide

## Project context

Agent Flow is a multi-agent orchestrator plugin for Claude Code. Agents are markdown files under `agents/`; orchestration commands live in `commands/`; skills under `skills/`. The `/orchestrate` command drives a six-phase pipeline (Explore → Plan → Implement → Review → Verify → Report) where each phase is delegated to a specialist agent (Riko, Senku, Loid, Lawliet, Alphonse).

## Your role: Phase 4 co-reviewer

You run alongside Lawliet in Phase 4 of `/orchestrate` as a co-reviewer. Lawliet handles linter-grounded findings: it runs tsc, mypy, ruff, eslint, and semgrep and reports what those tools catch. Your job is to catch what linters miss:

- Logic flaws and algorithmic errors
- Intent-vs-implementation mismatches (code does something different from what the task asked)
- Missing edge cases that tests do not cover
- Security smells that static analysis does not flag
- Naming and clarity issues that obscure intent without being formal lint violations

Do NOT duplicate Lawliet's static-analysis work. If tsc or ruff would catch it, do not surface it. Your findings are complementary, not redundant.

## Output contract

Return exactly one verdict line followed by zero or more finding lines.

Verdict line (required, exactly one):

```
APPROVED
NEEDS_CHANGES
BLOCKED
```

Finding lines (optional, one per line):

```
<severity>: <file>:<line>: <issue description>
```

Severity values: `ERROR`, `WARNING`, `INFO`

Rules:
- Only include a finding if you can cite an exact file and line number.
- Findings without a file:line citation are advisory — do not include them as findings.
- If you have advisory observations without a file:line, summarise them briefly after the findings block under the heading "Advisory notes:" but do not let them affect your verdict.
- Use BLOCKED only when you find a clear bug or security issue with a file:line citation.
- Use NEEDS_CHANGES for style/correctness issues with a file:line citation.
- Use APPROVED when no blocking or needs-changes findings exist.

## Severity scale

- `ERROR` → BLOCKED: bug, security issue, broken invariant. Produces wrong behavior or unsafe state.
- `WARNING` → NEEDS_CHANGES: correctness or style issue that works but should be improved.
- `INFO` → APPROVED + advisory: nit or suggestion. Never changes the verdict.

## Repo-specific blocker classes

Use this as a concrete checklist when reviewing agent-flow diffs:

1. **Shell safety**: Shell scripts must use `set -euo pipefail` at the top. Flag any new `.sh` file missing it (applies to standalone `.sh` files only; embedded Bash inside `commands/*.md` is out of scope for this rubric). This is an ERROR.

2. **Heredoc variable expansion**: `commands/*.md` Bash blocks must not use `$VAR` inside `<<'PROMPT'` heredocs — single-quoted heredoc delimiters suppress all variable expansion, so the variable reference will be emitted literally instead of substituted. This is silently broken. Flag any `$VARIABLE` inside a `<<'PROMPT'` block as an ERROR.

3. **YAML validity**: YAML emitted to `.claude/orchestration.local.md` must be syntactically valid. Unclosed keys, bad indentation, or stray characters will break downstream grep-based parsers. This is an ERROR.

4. **No hardcoded paths**: Scripts must not contain `/Users/...` or other machine-specific absolute paths. Use `${HOME}`, `$(git rev-parse --show-toplevel)`, or `${CLAUDE_PLUGIN_ROOT}` instead. This is a WARNING.

5. **No secrets**: No API keys, tokens, passwords, or credentials may appear in committed files. This is an ERROR.

## What NOT to flag (defer to Lawliet)

Do not surface the following — Lawliet already covers them:

- Lint-level style nits caught by `tsc`, `mypy`, `ruff`, `eslint`, or `semgrep`.
- Type-system errors that a type checker would report.
- Standard naming convention violations covered by configured linters.
- Import ordering, whitespace, or formatting issues caught by formatters.

Raising these duplicates Lawliet's work and clutters the review with noise.

## Tie-breaker

When uncertain between BLOCKED and NEEDS_CHANGES:

- Use **BLOCKED** only when the issue will produce wrong behavior or an unsafe state if the code is merged as-is.
- Use **NEEDS_CHANGES** when the code works but has a correctness or clarity problem that should be fixed before merging.

When uncertain between NEEDS_CHANGES and APPROVED (INFO):

- Use **NEEDS_CHANGES** only when the finding has a file:line citation and represents a real improvement, not a preference.
- Use **APPROVED** with an advisory note for everything else.
4 changes: 2 additions & 2 deletions agents/Lawliet.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ Lawliet performs **static analysis only**: type checking, linting, security scan
- Code quality: `sonarqube`, `coderabbit` (if available)
6. Check against requirements
7. Verify patterns are followed (cross-reference graph-surfaced siblings from step 4)
8. Look for edge cases (include callers from step 3's blast-radius output)
8. Look for structural edge cases surfaced by callers from step 3's blast-radius output (untested call sites, divergent error paths, missing null-guards). Do not duplicate Codex's logic-level edge-case analysis — that lives in Phase 4 co-review per AGENTS.md.
9. Analyze security issues

**Allowed Bash Commands (Static Analysis Only):**
Expand Down Expand Up @@ -94,7 +94,7 @@ node app.js # Running code is forbidden
- [Improvement suggestions]

### Verdict
[APPROVED | NEEDS_CHANGES | BLOCKED]
[APPROVED | NEEDS_CHANGES]

## Self-Reflection Protocol

Expand Down
64 changes: 63 additions & 1 deletion commands/orchestrate.md
Original file line number Diff line number Diff line change
Expand Up @@ -227,7 +227,69 @@ Proceed only when Loid confirms changes are implemented.
- Check for security issues
- Verify adherence to patterns

After Lawliet completes:
After Lawliet completes, record its verdict (`APPROVED` or `NEEDS_CHANGES`).

#### Codex co-review (optional)

Check whether Codex is available:

```bash
CODEX_AVAILABLE=$(grep -A1 '^codex:' .claude/orchestration.local.md | grep 'available:' | sed 's/.*available: *//')
```

**When `CODEX_AVAILABLE` is `true`**, run Codex as a co-reviewer via Bash (NOT a subagent dispatch — Codex is an external CLI):

Before dispatching Codex, the orchestrator MUST persist Lawliet's findings to a fixed well-known path so the Codex dispatch can include them. Lawliet's full markdown response lives in the orchestrator's conversation memory — use the Write tool to write Lawliet's full markdown response verbatim to `.claude/codex/lawliet-findings.tmp.md` before running the dispatch block below. Create the directory if needed: `mkdir -p .claude/codex`.

Then dispatch Codex via the shared helper:

```bash
LAWLIET_FINDINGS_FILE=".claude/codex/lawliet-findings.tmp.md"
CODEX_RESULT=$(bash ${CLAUDE_PLUGIN_ROOT}/scripts/dispatch-codex-review.sh \
--state-file .claude/orchestration.local.md \
--lawliet-findings "$LAWLIET_FINDINGS_FILE")
CODEX_RAN=$(echo "$CODEX_RESULT" | grep '^codex_ran:' | sed 's/.*: *//')
CODEX_VERDICT=$(echo "$CODEX_RESULT" | grep '^codex_verdict:' | sed 's/.*: *//')
CODEX_RAW_PATH=$(echo "$CODEX_RESULT" | grep '^codex_raw_path:' | sed 's/.*: *//')
CODEX_RAW=""
if [[ -n "$CODEX_RAW_PATH" && -f "$CODEX_RAW_PATH" ]]; then
CODEX_RAW=$(cat "$CODEX_RAW_PATH")
rm -f "$CODEX_RAW_PATH"
fi
rm -f "$LAWLIET_FINDINGS_FILE"
```

The output contract and severity scale are defined in `AGENTS.md` at the repo root, which Codex auto-loads on every invocation.

If the shared helper (`scripts/dispatch-codex-review.sh`) detects that `codex exec` exited non-zero (timeout, auth failure, network), Phase 4 falls back to Lawliet-only — the helper exits 0 but emits `codex_verdict: ADVISORY` so the orchestrator can detect the degraded state. The final verdict is whatever Lawliet emitted.

The helper builds the diff, task description, and Lawliet's findings internally. `$CODEX_RAW` contains Codex's full reply (as written by `--output-last-message`): the first non-blank line is the verdict (`APPROVED` / `NEEDS_CHANGES` / `BLOCKED`); subsequent lines of the form `<severity>: <file>:<line>: <issue>` are findings. Findings without a `file:line` token are advisory only and cannot trigger a NEEDS_CHANGES verdict. If the first non-blank line is not one of `APPROVED`, `NEEDS_CHANGES`, or `BLOCKED`, treat the entire Codex output as advisory and log `warn: Codex verdict unparseable — treating as advisory`.

**Findings without a `file:line` citation are advisory only** — they do not affect the final verdict and Loid is NOT routed back for them.

**Disagreement rule (truth table):**

Note: Lawliet emits only `APPROVED` or `NEEDS_CHANGES`. `BLOCKED` is a Codex-only verdict (used when Codex finds a severity-blocker with a `file:line` cite).

| Lawliet verdict | Codex verdict | Codex has file:line citation? | Final Phase 4 verdict |
|-----------------|---------------|-------------------------------|-----------------------|
| APPROVED | APPROVED | n/a | APPROVED |
| APPROVED | BLOCKED | yes | NEEDS_CHANGES (surface Codex cite) |
| APPROVED | BLOCKED | no | APPROVED (advisory only) |
| APPROVED | NEEDS_CHANGES | yes | NEEDS_CHANGES (surface Codex cite) |
| APPROVED | NEEDS_CHANGES | no | APPROVED (advisory only) |
| NEEDS_CHANGES | APPROVED | n/a | NEEDS_CHANGES (Lawliet wins on linter-grounded findings) |
| NEEDS_CHANGES | BLOCKED or NEEDS_CHANGES | any | NEEDS_CHANGES |

When the final verdict is NEEDS_CHANGES, delegate back to Loid with specific issues from Lawliet and/or Codex (file:line citations required).

**When `CODEX_AVAILABLE` is `false`**, skip the Codex co-review entirely. Phase 4 behaves identically to today (Lawliet-only). Log one info line:

```
info: Codex co-review skipped (codex.available: false)
```

After computing the final Phase 4 verdict:
- If APPROVED: Update state and proceed
- If NEEDS_CHANGES: Delegate back to Loid with specific issues

Expand Down
117 changes: 107 additions & 10 deletions commands/team-orchestrate.md
Original file line number Diff line number Diff line change
Expand Up @@ -391,16 +391,66 @@ SendMessage(
)
```

**Step 7.5: Codex co-review (after Lawliet collection, before state writes)**

Lawliet's reply from Step 7 is now in your conversation memory. If Codex is
available, dispatch it as a sequential co-reviewer before recording the review
verdict — Codex's findings may flip Lawliet's verdict per the truth table in
`commands/orchestrate.md` Phase 4.

```bash
CODEX_AVAILABLE=$(grep -A1 '^codex:' .claude/team-orchestration.local.md | grep 'available:' | sed 's/.*available: *//')
```

When `CODEX_AVAILABLE` is `true`:

1. Use the Write tool to persist Lawliet's full reply (from Step 7's SendMessage)
verbatim to `.claude/codex/lawliet-findings.tmp.md`. Create the directory first:
`mkdir -p .claude/codex`.

2. Dispatch Codex via the shared helper:

```bash
LAWLIET_FINDINGS_FILE=".claude/codex/lawliet-findings.tmp.md"
CODEX_RESULT=$(bash ${CLAUDE_PLUGIN_ROOT}/scripts/dispatch-codex-review.sh \
--state-file .claude/team-orchestration.local.md \
--lawliet-findings "$LAWLIET_FINDINGS_FILE")
CODEX_VERDICT=$(echo "$CODEX_RESULT" | grep '^codex_verdict:' | sed 's/.*: *//')
CODEX_RAW_PATH=$(echo "$CODEX_RESULT" | grep '^codex_raw_path:' | sed 's/.*: *//')
CODEX_RAW=""
if [[ -n "$CODEX_RAW_PATH" && -f "$CODEX_RAW_PATH" ]]; then
CODEX_RAW=$(cat "$CODEX_RAW_PATH")
rm -f "$CODEX_RAW_PATH"
fi
rm -f "$LAWLIET_FINDINGS_FILE"
```

3. Apply the truth table from `commands/orchestrate.md` Phase 4 to reconcile
Lawliet + Codex into `FINAL_REVIEW_VERDICT` (one of APPROVED/NEEDS_CHANGES).
The truth table is mode-agnostic.

4. The reconciled verdict feeds Step 8's `--gate-result` decision.

When `CODEX_AVAILABLE` is `false`:
```
info: Codex co-review skipped (codex.available: false)
```
`FINAL_REVIEW_VERDICT` = Lawliet's verdict; Phase 4 behavior is unchanged.

**Step 8: Update Teammate Statuses**

Update state for each teammate result:

```bash
# Update review status
# Update review status (using reconciled verdict from Step 7.5)
REVIEW_GATE_RESULT="passed"
if [[ "$FINAL_REVIEW_VERDICT" == "NEEDS_CHANGES" ]]; then
REVIEW_GATE_RESULT="failed"
fi
bash ${CLAUDE_PLUGIN_ROOT}/scripts/update-team-state.sh \
--parallel-group review_verification --teammate review \
--gate-result passed --agent Lawliet \
--message "Code review passed"
--gate-result "$REVIEW_GATE_RESULT" --agent Lawliet \
--message "Code review: $FINAL_REVIEW_VERDICT"

# Update verification status
bash ${CLAUDE_PLUGIN_ROOT}/scripts/update-team-state.sh \
Expand Down Expand Up @@ -437,13 +487,20 @@ echo "Parallel Group Results: $SUMMARY"
FAILED_PHASES=$(echo "$MERGE_RESULT" | grep -o '"failed_phases": *\[[^]]*\]')
```

When `FINAL_REVIEW_VERDICT` is `NEEDS_CHANGES`, include BOTH Lawliet's findings
AND Codex's `$CODEX_RAW` (captured in Step 7.5) in the delegation prompt.
Otherwise Loid receives only Lawliet's findings and may not address the Codex-flagged issues, leaving the gate stuck.

Delegate to Loid with specific issues:
```
Task(agent="Loid", prompt="
Fix the issues found in parallel review/verification:

Review Issues (if any):
[Include Lawliet's feedback]
[Include Lawliet's feedback verbatim]

Codex Co-Review Findings (if Codex contributed to the failed verdict):
[Include Codex's $CODEX_RAW verbatim — file:line citations from Step 7.5 are essential here so Loid knows what to fix]

Verification Issues (if any):
[Include Alphonse's failures]
Expand All @@ -464,14 +521,54 @@ echo "Parallel Group Results: $SUMMARY"
- Check for security issues
- Verify adherence to patterns

After Lawliet completes:
- If APPROVED: Update state and proceed
- If NEEDS_CHANGES: Delegate back to Loid with specific issues
After Lawliet completes, do NOT record the gate result yet — proceed to
the **Codex co-review** block below. The final review verdict
(`FINAL_REVIEW_VERDICT`) is computed AFTER Codex runs (or after Codex is
skipped due to `codex.available: false`), per the truth table at
`commands/orchestrate.md` Phase 4.

**Codex co-review (sequential fallback)**

After Lawliet completes, if Codex is available, run Codex co-review before
recording the review gate result.

```bash
bash ${CLAUDE_PLUGIN_ROOT}/scripts/update-team-state.sh \
--phase verification --gate-result passed --agent Lawliet \
--message "Code review passed"
CODEX_AVAILABLE=$(grep -A1 '^codex:' .claude/team-orchestration.local.md | grep 'available:' | sed 's/.*available: *//')
```

When `CODEX_AVAILABLE` is `true`, use the Write tool to persist Lawliet's full
reply to `.claude/codex/lawliet-findings.tmp.md` (`mkdir -p .claude/codex`
first), then:

```bash
LAWLIET_FINDINGS_FILE=".claude/codex/lawliet-findings.tmp.md"
CODEX_RESULT=$(bash ${CLAUDE_PLUGIN_ROOT}/scripts/dispatch-codex-review.sh \
--state-file .claude/team-orchestration.local.md \
--lawliet-findings "$LAWLIET_FINDINGS_FILE")
CODEX_VERDICT=$(echo "$CODEX_RESULT" | grep '^codex_verdict:' | sed 's/.*: *//')
CODEX_RAW_PATH=$(echo "$CODEX_RESULT" | grep '^codex_raw_path:' | sed 's/.*: *//')
CODEX_RAW=""
if [[ -n "$CODEX_RAW_PATH" && -f "$CODEX_RAW_PATH" ]]; then
CODEX_RAW=$(cat "$CODEX_RAW_PATH")
rm -f "$CODEX_RAW_PATH"
fi
rm -f "$LAWLIET_FINDINGS_FILE"
```

Apply the truth table from `commands/orchestrate.md` Phase 4 to reconcile into
`FINAL_REVIEW_VERDICT`. If NEEDS_CHANGES, delegate back to Loid with combined
citations and loop — do NOT proceed to verification yet.

When `CODEX_AVAILABLE` is `false`, log
`info: Codex co-review skipped (codex.available: false)` and use Lawliet's
verdict as final.

```bash
if [[ "$FINAL_REVIEW_VERDICT" == "APPROVED" ]]; then
bash ${CLAUDE_PLUGIN_ROOT}/scripts/update-team-state.sh \
--phase verification --gate-result passed --agent Lawliet \
--message "Code review passed (Codex verdict: $CODEX_VERDICT)"
fi
```

**Phase 5: Verification (Sequential)**
Expand Down
Loading