Skip to content

Strengthen CCR handling, auto-update PR descriptions, detect circular review loops #69

@chanakyav

Description

@chanakyav

Summary

Three issues with the current review-fix loop:

  1. PR descriptions are not updated after fix iterations — the only instruction is a soft "if significantly changed…" in fix_prompt() step 8. The agent frequently ignores it, there is no github_api wrapper, and the orchestrator never verifies.
  2. Autopilot blindly follows Copilot Code Review (CCR) suggestions — the prompt says "use your judgment" but provides no verification framework. The fix summary only has fixed/skipped with no evidence trail.
  3. CCR can create circular loops — e.g. CCR suggests making a field optional → autopilot does it → CCR says "why is this optional?" → autopilot reverts → repeat until max_iterations. No detection mechanism exists.

Phase 1: PR Description Auto-Update (dedicated orchestrator step)

Problem

The only PR description update instruction is a soft "if your fixes significantly changed…" in fix_prompt() step 8. The agent frequently ignores it. There is no github_api wrapper and no orchestrator verification.

Changes

  1. Add update_pr_description() and get_pr_description() to github_api.py — wrappers around gh pr edit and gh pr view --json body
  2. Add update_description_prompt() to prompts.py — instructs agent to rewrite the PR body based on current diff, task description, and existing body while preserving template structure
  3. Add UPDATE_DESCRIPTION state to Orchestrator in orchestrator.py — inserted between RESOLVE_COMMENTS and REQUEST_REVIEW:
    FIX → VERIFY_PUSH → RESOLVE_COMMENTS → UPDATE_DESCRIPTION → REQUEST_REVIEW
  4. Implement _do_update_description() — fetches current body, diff stat, runs agent with the new prompt, calls update_pr_description()
  5. Remove step 8 from fix_prompt() — the orchestrator now guarantees this
  6. Tests for new API functions and orchestrator state transitions

Phase 2: Strengthen CCR Verification (3-tier decision model)

Problem

The agent applies most CCR suggestions without verifying claims. No evidence trail for decisions.

Changes

  1. Rewrite fix_prompt() instructions with a mandatory 3-tier verification framework:

    • Tier 1 — AGREE & FIX: CCR is clearly correct (real bug, missing null check, test failure). Fix it.
    • Tier 2 — DISAGREE with evidence: CCR is wrong and autopilot has concrete proof (API contract, existing test, specification). Reply with evidence, mark "status": "dismissed", resolve the thread.
    • Tier 3 — UNCERTAIN: Cannot prove either way. Reply explaining uncertainty, mark "status": "uncertain", do NOT resolve the thread — leave for a human.
  2. Before making any decision, the agent must:

    • Read surrounding code context (not just the flagged line)
    • Check if tests cover the scenario CCR mentions
    • If CCR suggests adding/removing something, verify the claim against actual usage
  3. Update _do_resolve_comments() in orchestrator.py:

    • "fixed" → reply "Addressed in SHA — message" + resolve (existing)
    • "skipped" → reply "Skipped — reason" + resolve (existing, unchanged)
    • "dismissed" → reply "Dismissed — evidence/reason" + resolve (NEW)
    • "uncertain" → reply "Needs human review — reason" + DO NOT resolve (NEW)
  4. Require "evidence" field in fix summary JSON for dismissed/uncertain entries

  5. Tests — verify uncertain does NOT resolve, dismissed resolves with evidence in reply body


Phase 3: Circular Loop Detection

Problem

CCR can create contradictory review cycles. No mechanism exists to detect or break these loops.

Changes

  1. Track per-comment history across iterations — in _do_parse_review(), compare current unresolved comments against previous fix summaries by matching on (file_path, body_text_substring). Store bounce_count per comment in comment-history.json
  2. Add _detect_bouncing_comments() helper — if a comment was "fixed" then reappeared 2 times, flag it as a circular loop
  3. Add bouncing_comments context to fix_prompt() — render a warning: "⚠️ This comment has bounced N times. CCR keeps reversing your fix. Do NOT fix it again. Mark as uncertain for human review."
  4. Wire through _do_fix() — calls detection, passes result to prompt
  5. Tests — simulate bouncing comment history, verify flagging after 2 bounces, verify prompt includes warning

Files to Modify

  • src/autopilot_loop/prompts.py — rewrite fix_prompt(), add update_description_prompt()
  • src/autopilot_loop/github_api.py — add update_pr_description(), get_pr_description()
  • src/autopilot_loop/orchestrator.py — add UPDATE_DESCRIPTION state, modify _do_resolve_comments(), add _detect_bouncing_comments(), modify _do_fix() and _do_parse_review()
  • tests/test_github_api.py — new tests for API functions
  • tests/test_orchestrator.py — new tests for states, statuses, loop detection

Decisions

  • PR description update is a dedicated orchestrator step (guaranteed, not agent discretion)
  • CCR disagreement with evidence: reply with evidence + resolve thread
  • Circular loop threshold: 2 bounces triggers escalation to human
  • Two new fix summary statuses: dismissed (resolved with evidence) and uncertain (left unresolved for human)
  • PR description rewrite uses a copilot agent invocation (not a Python template)

🤖 autopilot-loop

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions