Strengthen CCR handling, auto-update PR descriptions, detect circular review loops

## Summary

Three issues with the current review-fix loop:

1. **PR descriptions are not updated after fix iterations** — the only instruction is a soft "if significantly changed…" in `fix_prompt()` step 8. The agent frequently ignores it, there is no `github_api` wrapper, and the orchestrator never verifies.
2. **Autopilot blindly follows Copilot Code Review (CCR) suggestions** — the prompt says "use your judgment" but provides no verification framework. The fix summary only has `fixed`/`skipped` with no evidence trail.
3. **CCR can create circular loops** — e.g. CCR suggests making a field optional → autopilot does it → CCR says "why is this optional?" → autopilot reverts → repeat until `max_iterations`. No detection mechanism exists.

---

## Phase 1: PR Description Auto-Update (dedicated orchestrator step)

### Problem
The only PR description update instruction is a soft "if your fixes significantly changed…" in `fix_prompt()` step 8. The agent frequently ignores it. There is no `github_api` wrapper and no orchestrator verification.

### Changes

1. **Add `update_pr_description()` and `get_pr_description()` to `github_api.py`** — wrappers around `gh pr edit` and `gh pr view --json body`
2. **Add `update_description_prompt()` to `prompts.py`** — instructs agent to rewrite the PR body based on current diff, task description, and existing body while preserving template structure
3. **Add `UPDATE_DESCRIPTION` state to `Orchestrator`** in `orchestrator.py` — inserted between `RESOLVE_COMMENTS` and `REQUEST_REVIEW`:
   FIX → VERIFY_PUSH → RESOLVE_COMMENTS → **UPDATE_DESCRIPTION** → REQUEST_REVIEW
4. **Implement `_do_update_description()`** — fetches current body, diff stat, runs agent with the new prompt, calls `update_pr_description()`
5. **Remove step 8 from `fix_prompt()`** — the orchestrator now guarantees this
6. **Tests** for new API functions and orchestrator state transitions

---

## Phase 2: Strengthen CCR Verification (3-tier decision model)

### Problem
The agent applies most CCR suggestions without verifying claims. No evidence trail for decisions.

### Changes

7. **Rewrite `fix_prompt()` instructions** with a mandatory 3-tier verification framework:
   - **Tier 1 — AGREE & FIX**: CCR is clearly correct (real bug, missing null check, test failure). Fix it.
   - **Tier 2 — DISAGREE with evidence**: CCR is wrong and autopilot has concrete proof (API contract, existing test, specification). Reply with evidence, mark `"status": "dismissed"`, resolve the thread.
   - **Tier 3 — UNCERTAIN**: Cannot prove either way. Reply explaining uncertainty, mark `"status": "uncertain"`, do NOT resolve the thread — leave for a human.

8. **Before making any decision, the agent must**:
   - Read surrounding code context (not just the flagged line)
   - Check if tests cover the scenario CCR mentions
   - If CCR suggests adding/removing something, verify the claim against actual usage

9. **Update `_do_resolve_comments()` in `orchestrator.py`**:
   - `"fixed"` → reply "Addressed in SHA — message" + resolve (existing)
   - `"skipped"` → reply "Skipped — reason" + resolve (existing, unchanged)
   - `"dismissed"` → reply "Dismissed — evidence/reason" + resolve (NEW)
   - `"uncertain"` → reply "Needs human review — reason" + **DO NOT resolve** (NEW)

10. **Require `"evidence"` field** in fix summary JSON for `dismissed`/`uncertain` entries
11. **Tests** — verify `uncertain` does NOT resolve, `dismissed` resolves with evidence in reply body

---

## Phase 3: Circular Loop Detection

### Problem
CCR can create contradictory review cycles. No mechanism exists to detect or break these loops.

### Changes

12. **Track per-comment history across iterations** — in `_do_parse_review()`, compare current unresolved comments against previous fix summaries by matching on `(file_path, body_text_substring)`. Store `bounce_count` per comment in `comment-history.json`
13. **Add `_detect_bouncing_comments()` helper** — if a comment was "fixed" then reappeared 2 times, flag it as a circular loop
14. **Add `bouncing_comments` context to `fix_prompt()`** — render a warning: "⚠️ This comment has bounced N times. CCR keeps reversing your fix. Do NOT fix it again. Mark as uncertain for human review."
15. **Wire through `_do_fix()`** — calls detection, passes result to prompt
16. **Tests** — simulate bouncing comment history, verify flagging after 2 bounces, verify prompt includes warning

---

## Files to Modify

- `src/autopilot_loop/prompts.py` — rewrite `fix_prompt()`, add `update_description_prompt()`
- `src/autopilot_loop/github_api.py` — add `update_pr_description()`, `get_pr_description()`
- `src/autopilot_loop/orchestrator.py` — add `UPDATE_DESCRIPTION` state, modify `_do_resolve_comments()`, add `_detect_bouncing_comments()`, modify `_do_fix()` and `_do_parse_review()`
- `tests/test_github_api.py` — new tests for API functions
- `tests/test_orchestrator.py` — new tests for states, statuses, loop detection

## Decisions

- PR description update is a **dedicated orchestrator step** (guaranteed, not agent discretion)
- CCR disagreement with evidence: **reply with evidence + resolve thread**
- Circular loop threshold: **2 bounces** triggers escalation to human
- Two new fix summary statuses: `dismissed` (resolved with evidence) and `uncertain` (left unresolved for human)
- PR description rewrite uses a **copilot agent invocation** (not a Python template)

---
🤖 **autopilot-loop**

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Strengthen CCR handling, auto-update PR descriptions, detect circular review loops #69

Summary

Phase 1: PR Description Auto-Update (dedicated orchestrator step)

Problem

Changes

Phase 2: Strengthen CCR Verification (3-tier decision model)

Problem

Changes

Phase 3: Circular Loop Detection

Problem

Changes

Files to Modify

Decisions

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Strengthen CCR handling, auto-update PR descriptions, detect circular review loops #69

Description

Summary

Phase 1: PR Description Auto-Update (dedicated orchestrator step)

Problem

Changes

Phase 2: Strengthen CCR Verification (3-tier decision model)

Problem

Changes

Phase 3: Circular Loop Detection

Problem

Changes

Files to Modify

Decisions

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions