Summary
Three issues with the current review-fix loop:
- PR descriptions are not updated after fix iterations — the only instruction is a soft "if significantly changed…" in
fix_prompt() step 8. The agent frequently ignores it, there is no github_api wrapper, and the orchestrator never verifies.
- Autopilot blindly follows Copilot Code Review (CCR) suggestions — the prompt says "use your judgment" but provides no verification framework. The fix summary only has
fixed/skipped with no evidence trail.
- CCR can create circular loops — e.g. CCR suggests making a field optional → autopilot does it → CCR says "why is this optional?" → autopilot reverts → repeat until
max_iterations. No detection mechanism exists.
Phase 1: PR Description Auto-Update (dedicated orchestrator step)
Problem
The only PR description update instruction is a soft "if your fixes significantly changed…" in fix_prompt() step 8. The agent frequently ignores it. There is no github_api wrapper and no orchestrator verification.
Changes
- Add
update_pr_description() and get_pr_description() to github_api.py — wrappers around gh pr edit and gh pr view --json body
- Add
update_description_prompt() to prompts.py — instructs agent to rewrite the PR body based on current diff, task description, and existing body while preserving template structure
- Add
UPDATE_DESCRIPTION state to Orchestrator in orchestrator.py — inserted between RESOLVE_COMMENTS and REQUEST_REVIEW:
FIX → VERIFY_PUSH → RESOLVE_COMMENTS → UPDATE_DESCRIPTION → REQUEST_REVIEW
- Implement
_do_update_description() — fetches current body, diff stat, runs agent with the new prompt, calls update_pr_description()
- Remove step 8 from
fix_prompt() — the orchestrator now guarantees this
- Tests for new API functions and orchestrator state transitions
Phase 2: Strengthen CCR Verification (3-tier decision model)
Problem
The agent applies most CCR suggestions without verifying claims. No evidence trail for decisions.
Changes
-
Rewrite fix_prompt() instructions with a mandatory 3-tier verification framework:
- Tier 1 — AGREE & FIX: CCR is clearly correct (real bug, missing null check, test failure). Fix it.
- Tier 2 — DISAGREE with evidence: CCR is wrong and autopilot has concrete proof (API contract, existing test, specification). Reply with evidence, mark
"status": "dismissed", resolve the thread.
- Tier 3 — UNCERTAIN: Cannot prove either way. Reply explaining uncertainty, mark
"status": "uncertain", do NOT resolve the thread — leave for a human.
-
Before making any decision, the agent must:
- Read surrounding code context (not just the flagged line)
- Check if tests cover the scenario CCR mentions
- If CCR suggests adding/removing something, verify the claim against actual usage
-
Update _do_resolve_comments() in orchestrator.py:
"fixed" → reply "Addressed in SHA — message" + resolve (existing)
"skipped" → reply "Skipped — reason" + resolve (existing, unchanged)
"dismissed" → reply "Dismissed — evidence/reason" + resolve (NEW)
"uncertain" → reply "Needs human review — reason" + DO NOT resolve (NEW)
-
Require "evidence" field in fix summary JSON for dismissed/uncertain entries
-
Tests — verify uncertain does NOT resolve, dismissed resolves with evidence in reply body
Phase 3: Circular Loop Detection
Problem
CCR can create contradictory review cycles. No mechanism exists to detect or break these loops.
Changes
- Track per-comment history across iterations — in
_do_parse_review(), compare current unresolved comments against previous fix summaries by matching on (file_path, body_text_substring). Store bounce_count per comment in comment-history.json
- Add
_detect_bouncing_comments() helper — if a comment was "fixed" then reappeared 2 times, flag it as a circular loop
- Add
bouncing_comments context to fix_prompt() — render a warning: "⚠️ This comment has bounced N times. CCR keeps reversing your fix. Do NOT fix it again. Mark as uncertain for human review."
- Wire through
_do_fix() — calls detection, passes result to prompt
- Tests — simulate bouncing comment history, verify flagging after 2 bounces, verify prompt includes warning
Files to Modify
src/autopilot_loop/prompts.py — rewrite fix_prompt(), add update_description_prompt()
src/autopilot_loop/github_api.py — add update_pr_description(), get_pr_description()
src/autopilot_loop/orchestrator.py — add UPDATE_DESCRIPTION state, modify _do_resolve_comments(), add _detect_bouncing_comments(), modify _do_fix() and _do_parse_review()
tests/test_github_api.py — new tests for API functions
tests/test_orchestrator.py — new tests for states, statuses, loop detection
Decisions
- PR description update is a dedicated orchestrator step (guaranteed, not agent discretion)
- CCR disagreement with evidence: reply with evidence + resolve thread
- Circular loop threshold: 2 bounces triggers escalation to human
- Two new fix summary statuses:
dismissed (resolved with evidence) and uncertain (left unresolved for human)
- PR description rewrite uses a copilot agent invocation (not a Python template)
🤖 autopilot-loop
Summary
Three issues with the current review-fix loop:
fix_prompt()step 8. The agent frequently ignores it, there is nogithub_apiwrapper, and the orchestrator never verifies.fixed/skippedwith no evidence trail.max_iterations. No detection mechanism exists.Phase 1: PR Description Auto-Update (dedicated orchestrator step)
Problem
The only PR description update instruction is a soft "if your fixes significantly changed…" in
fix_prompt()step 8. The agent frequently ignores it. There is nogithub_apiwrapper and no orchestrator verification.Changes
update_pr_description()andget_pr_description()togithub_api.py— wrappers aroundgh pr editandgh pr view --json bodyupdate_description_prompt()toprompts.py— instructs agent to rewrite the PR body based on current diff, task description, and existing body while preserving template structureUPDATE_DESCRIPTIONstate toOrchestratorinorchestrator.py— inserted betweenRESOLVE_COMMENTSandREQUEST_REVIEW:FIX → VERIFY_PUSH → RESOLVE_COMMENTS → UPDATE_DESCRIPTION → REQUEST_REVIEW
_do_update_description()— fetches current body, diff stat, runs agent with the new prompt, callsupdate_pr_description()fix_prompt()— the orchestrator now guarantees thisPhase 2: Strengthen CCR Verification (3-tier decision model)
Problem
The agent applies most CCR suggestions without verifying claims. No evidence trail for decisions.
Changes
Rewrite
fix_prompt()instructions with a mandatory 3-tier verification framework:"status": "dismissed", resolve the thread."status": "uncertain", do NOT resolve the thread — leave for a human.Before making any decision, the agent must:
Update
_do_resolve_comments()inorchestrator.py:"fixed"→ reply "Addressed in SHA — message" + resolve (existing)"skipped"→ reply "Skipped — reason" + resolve (existing, unchanged)"dismissed"→ reply "Dismissed — evidence/reason" + resolve (NEW)"uncertain"→ reply "Needs human review — reason" + DO NOT resolve (NEW)Require
"evidence"field in fix summary JSON fordismissed/uncertainentriesTests — verify
uncertaindoes NOT resolve,dismissedresolves with evidence in reply bodyPhase 3: Circular Loop Detection
Problem
CCR can create contradictory review cycles. No mechanism exists to detect or break these loops.
Changes
_do_parse_review(), compare current unresolved comments against previous fix summaries by matching on(file_path, body_text_substring). Storebounce_countper comment incomment-history.json_detect_bouncing_comments()helper — if a comment was "fixed" then reappeared 2 times, flag it as a circular loopbouncing_commentscontext tofix_prompt()— render a warning: "_do_fix()— calls detection, passes result to promptFiles to Modify
src/autopilot_loop/prompts.py— rewritefix_prompt(), addupdate_description_prompt()src/autopilot_loop/github_api.py— addupdate_pr_description(),get_pr_description()src/autopilot_loop/orchestrator.py— addUPDATE_DESCRIPTIONstate, modify_do_resolve_comments(), add_detect_bouncing_comments(), modify_do_fix()and_do_parse_review()tests/test_github_api.py— new tests for API functionstests/test_orchestrator.py— new tests for states, statuses, loop detectionDecisions
dismissed(resolved with evidence) anduncertain(left unresolved for human)🤖 autopilot-loop