Skip to content

feat(autofix): add incident repair mode for CLI command-level fixes#863

Closed
jackwener wants to merge 3 commits intomainfrom
feat/autofix-incident-repair
Closed

feat(autofix): add incident repair mode for CLI command-level fixes#863
jackwener wants to merge 3 commits intomainfrom
feat/autofix-incident-repair

Conversation

@jackwener
Copy link
Copy Markdown
Owner

Summary

  • Add incident repair mode to AutoFix (fix.ts --mode incident --spec <name>) for detecting and fixing CLI command failures caused by site changes
  • New eval-cli.ts runner with failure taxonomy (regression / precondition / infrastructure / skipped) — metric only counts regressions
  • New command-specs.json with 3 initial specs: weibo-hot-smoke (read_only), xiaohongshu-search-smoke (read_only), twitter-reply-fill-smoke (fill_only)
  • Safety profile: read_only / fill_only / publish classification per spec, fill_only injects OPENCLI_DRY_RUN=1
  • Twitter reply now supports OPENCLI_DRY_RUN — fills composer without clicking submit, returns { status: "dry_run", filled: true, submitted: false }
  • Full design doc at designs/autofix-incident-repair.md

Closes discussion from #855 review thread about AutoFix not being able to detect CLI command-level failures.

Design

See designs/autofix-incident-repair.md for the full design document, co-authored with @codex-mini0.

Key decisions:

  • Two modes, one entry point: fix.ts handles both repo repair (existing) and incident repair (new) via --mode flag
  • Incident mode guard is stricter: npm run build && npm test (must not break repo health)
  • Failure taxonomy: prevents engine from trying to "fix" auth issues or infra problems
  • Safety enforcement: publish specs require explicit --allow-side-effects
  • OPENCLI_DRY_RUN is internal-only — not a public CLI flag yet

Test plan

  • Build passes (npm run build)
  • Twitter reply unit tests pass (7/7)
  • @codex-mini0 design review against agreed Phase 1 constraints
  • Manual: npx tsx autoresearch/eval-cli.ts runs all 3 specs
  • Manual: npx tsx autoresearch/eval-cli.ts --spec weibo-hot-smoke runs single spec

AutoFix currently only detects repo-level issues (build/test/browse-DOM).
This adds a new "incident mode" that can detect and repair CLI command
failures caused by site changes, selector updates, etc.

Changes:
- fix.ts: --mode repo|incident --spec <name> parameter
- eval-cli.ts: new command-level runner with failure taxonomy
- command-specs.json: 3 initial specs (weibo hot, xiaohongshu search, twitter reply)
- config.ts: CommandIncidentSpec type definitions
- twitter/reply.ts: OPENCLI_DRY_RUN support for safe fill-only testing
- designs/autofix-incident-repair.md: full design document
…ipeline

1. Rebase on main picks up #860/#862 (all replies use dedicated composer),
   so dry-run no longer regresses the text-reply fix.

2. Switch incident mode metric from pass_count (higher) to
   regression_count (lower). eval-cli.ts now outputs REGRESSIONS=N
   alongside SCORE=X/Y. fix.ts incident mode greps for REGRESSIONS=
   so infra/precondition failures don't pollute the metric.

3. Add pre-flight check in incident mode: detects infra/precondition
   failures and bails early with a clear message instead of entering
   the engine loop.

4. Pass prompt via stdin (not shell-escaped string) in incident mode
   modify callback, matching the convention from main.
@jackwener jackwener force-pushed the feat/autofix-incident-repair branch from 569d1f5 to d508226 Compare April 7, 2026 11:39
1. Fix inverted success condition in incident mode — 0 regressions
   now correctly prints "all regressions fixed" instead of "still failing"

2. Add post-hoc auth failure classification: when a spec declares
   prerequisites.auth, command output is scanned for auth-related
   patterns ("are you logged in", "unauthorized", etc.) and classified
   as failed_precondition instead of failed_regression.

3. Pre-flight bridge check: auth-required specs verify browser bridge
   is reachable before executing, bail as failed_precondition if not.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant