Skip to content

Conversation

@PureWeen
Copy link
Member

@PureWeen PureWeen commented Jan 31, 2026

Note

Are you waiting for the changes in this PR to be merged?
It would be very helpful if you could test the resulting artifacts from this PR and let us know in a comment if this change resolves your issue. Thank you!

Summary

This PR significantly enhances the PR agent workflow with improvements across two development phases:

Phase 1: Original Enhancements (Commits 1-8)

  1. Multi-model try-fix exploration - Uses 5 different AI models to explore alternative solutions
  2. Environment blocker handling - Strict rules to stop and ask user when environment issues occur
  3. Review automation script - PowerShell script that invokes Copilot CLI directly for PR reviews
  4. Branch safety rules - Prevents agent from switching branches during reviews
  5. Template formatting rules - Exact template adherence for downstream script compatibility

Phase 2: Consolidation & Simplification (Commit 9)

After multi-model review (5 models: Claude Sonnet 4.5, Claude Opus 4.5, GPT-5.2, GPT-5.2-Codex, Gemini 3 Pro), the following improvements were made:

  1. Shared rules extraction - Created SHARED-RULES.md to eliminate duplication across files
  2. Simplified git policy - Agent never runs git commands; always assumes correct branch
  3. State file handling - Changed "commit" to "save" (state files are gitignored)
  4. Reduced verbosity - Compressed cross-pollination section, converted PLAN-TEMPLATE to checklist

Commits

1. 80d7e412c2 - Update pr agent with multi-model try-fix workflow

Why: The original PR agent only used a single model for exploring fixes. Different AI models have different strengths and may find solutions others miss.

Changes:

  • Added Phase 4 multi-model workflow using 5 models:
    • claude-sonnet-4.5, claude-opus-4.5, gpt-5.2, gpt-5.2-codex, gemini-3-pro-preview
  • Cross-pollination loop: Share results between models to spark new ideas
  • Continue until all models confirm "no more approaches to explore"

2. 69cc6af403 - Address Copilot review suggestions

Why: Initial PR review feedback suggested improvements.

Changes:

  • Minor formatting and clarity improvements to pr.md

3. fe55c3fd21 - Add rules for template formatting and skill script usage

Why: Downstream scripts depend on exact regex patterns in state files. Agents were "improving" templates by adding attributes like open which broke parsing.

Changes:

  • Added "Follow Templates EXACTLY" rule - no adding attributes, no improving formats
  • Added "Use Skills' Scripts" rule - run provided PowerShell scripts, don't bypass with manual commands

4. debbee608e - Add 'Stop on Environment Blockers' rule to PR agent

Why: Agent was continuing through phases when environment issues (missing Appium, WinAppDriver errors) prevented completion, leading to incomplete reviews.

Changes:

  • Added explicit blocker handling section to pr.md
  • Common blockers: Appium drivers, WinAppDriver, Xcode, emulators, port conflicts
  • Must STOP, report the blocker, and ask user how to proceed
  • Never mark phase as blocked and continue to next phase

5. ad29f6a796 - Add PR review plan template and Review-PR.ps1 script

Why: Need a reusable template for consistent PR reviews and a script to automate invocation.

Changes:

  • Created .github/agents/pr/PLAN-TEMPLATE.md - Reusable 5-phase review plan
  • Created .github/scripts/Review-PR.ps1 - Script to prepare environment and invoke Copilot CLI

6. 886ea2aa8e - Improve blocker handling and fix Review-PR.ps1 for Copilot CLI

Why: During PR #27300 review, agent spent 10+ tool calls troubleshooting WinAppDriver instead of stopping after first failure.

Changes:

  • Added strict retry limits table:
    Blocker Type Max Retries Action
    Server errors (500, timeout) 0 STOP immediately
    Missing tools 1 install STOP and ask
    Port conflicts 1 kill STOP and ask
    WinAppDriver errors 0 STOP immediately
  • Added "What I tried" section to blocker report template
  • New prohibitions: Never spend more than 2-3 tool calls on same blocker

7. d67da75e85 - Update Review-PR.ps1 to invoke Copilot CLI directly

Why: Initially thought Copilot CLI was interactive-only. Discovered it supports -i <prompt> and -p <prompt> for programmatic invocation.

Changes:

  • Script now invokes copilot --agent pr -i "<prompt>" directly
  • Validates both gh CLI and copilot CLI are installed
  • New -NoInteractive switch for -p mode (exits after completion)
  • Dry run mode shows exactly what would be invoked

8. ed74c574a5 - Add 'Do NOT Switch Branches' rule to pr agent

Why: During PR review testing, the pr agent ran git checkout, git stash, and other branch-switching commands, causing loss of local changes and confusion about which code was being reviewed.

Changes:

  • Added explicit "Do NOT Switch Branches" rule to both pr.md and PLAN-TEMPLATE.md
  • Forbidden commands: git checkout, git switch, gh pr checkout, git stash
  • Agent must work on current branch as-is, using git diff or gh pr diff to see PR changes
  • Fixed variable expansion in Review-PR.ps1 prompt (double backticks for here-strings)

9. 632bfb7155 - Extract shared rules, simplify git policy, reduce duplication

Why: Multi-model review (5 AI models) identified significant duplication (~200 lines) across files, conflicting "commit" terminology, and overly verbose sections. The git checkout prohibition also conflicted with workflow steps that mentioned git checkout.

Changes:

  • Created SHARED-RULES.md (167 lines) - Single source of truth for:

    • Phase Completion Protocol
    • Follow Templates EXACTLY
    • No Direct Git Commands (absolute - agent never runs git)
    • Use Skills' Scripts
    • Stop on Environment Blockers (with retry limits)
    • Multi-Model Configuration (5 models list)
    • Platform Selection guidance
  • Simplified git policy - Agent is ALWAYS on correct branch, never runs git commands:

    • Removed git fetch/checkout from Phase 1 Pre-Flight
    • Phase 5: User handles commit/push/PR creation
    • Changed all "State file committed" → "State file saved" (gitignored files)
  • Compressed content:

    • Cross-pollination ASCII box: 51 → 20 lines
    • PLAN-TEMPLATE.md: Full docs → Pure checklist (226 → 112 lines)
    • pr.md: 662 → 535 lines (-19%)
    • post-gate.md: 403 → 302 lines (-25%)
    • Total reduction: 1291 → 1116 lines (-14%)
  • Eliminated duplication:

    • Blocker handling was in 3 files → now in SHARED-RULES.md only
    • Phase Completion Protocol was in 2 files → now in SHARED-RULES.md only
    • Model list was in 3 files → now in SHARED-RULES.md only

Files Changed

File Purpose
.github/agents/pr.md Main PR agent instructions (Phases 1-3)
.github/agents/pr/post-gate.md Phase 4-5 instructions (multi-model try-fix)
.github/agents/pr/PLAN-TEMPLATE.md NEW - Reusable 5-phase review checklist
.github/agents/pr/SHARED-RULES.md NEW - Extracted shared rules (single source of truth)
.github/scripts/Review-PR.ps1 NEW - Script to invoke Copilot CLI for PR review

Usage

# Interactive mode (default) - stays open for follow-up
pwsh .github/scripts/Review-PR.ps1 -PRNumber 33687

# Non-interactive mode - exits when done
pwsh .github/scripts/Review-PR.ps1 -PRNumber 33687 -NoInteractive

# Specific platform
pwsh .github/scripts/Review-PR.ps1 -PRNumber 33687 -Platform ios

# Skip merge if already on branch
pwsh .github/scripts/Review-PR.ps1 -PRNumber 33687 -SkipMerge

# Dry run to preview
pwsh .github/scripts/Review-PR.ps1 -PRNumber 33687 -DryRun

Multi-Model Validation

The final changes (commit 9) were validated by 5 AI models:

Model Verdict Key Feedback
Claude Sonnet 4.5 ✅ READY TO MERGE "All git command instructions successfully removed"
Claude Opus 4.5 ✅ READY TO MERGE "Excellent refactoring, no conflicting guidance"
GPT-5.2 ✅ READY TO MERGE "Progressive disclosure maintained"
GPT-5.2-Codex ✅ READY TO MERGE "No instructions to run git commands remain"
Gemini 3 Pro ✅ READY TO MERGE "Agent is instructed to STOP and ask user for commits"

Testing

Tested by reviewing PR #27300 (ScrollView ScrollToAsync fix):

  • Pre-Flight phase completed successfully
  • Tests phase verified test files exist
  • Gate phase encountered WinAppDriver blocker → agent correctly stopped and asked
  • Blocker handling rules validated through real-world usage
  • Branch safety verified: Agent stayed on branch instead of switching

Phase 4 now uses exhaustive multi-model exploration:

Round 1: Run try-fix with 5 models sequentially:
- claude-sonnet-4.5
- claude-opus-4.5
- gpt-5.2
- gpt-5.2-codex
- gemini-3-pro-preview

Round 2+: Cross-pollination loop:
- Share all results with all 5 models
- Ask for NEW ideas based on learnings
- Run try-fix for each new idea
- Repeat until all models confirm 'no new ideas'

Key constraints:
- SEQUENTIAL ONLY (same files/device)
- Exhaustion = all 5 models confirm no new ideas
- Never skip models in Round 1
Copilot AI review requested due to automatic review settings January 31, 2026 14:00
@PureWeen PureWeen added the area-ai-agents Copilot CLI agents, agent skills, AI-assisted development label Jan 31, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates the PR agent’s Phase 4 (Fix exploration) documentation to describe a multi-model, cross-pollinated try-fix workflow intended to expand fix diversity and ensure exhaustive alternative exploration.

Changes:

  • Replaces the Phase 4 Step 1 try-fix loop description with a 2-stage (Round 1 + cross-pollination) multi-model process.
  • Updates Phase 4 completion criteria and common mistakes to reflect the new multi-model sequencing requirements.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 7 comments.

File Description
.github/agents/pr.md Updates Phase 4 overview to require a multi-model try-fix workflow and references the detailed post-gate instructions.
.github/agents/pr/post-gate.md Rewrites Phase 4 Step 1 into a multi-model Round 1 + cross-pollination loop and updates completion checklist/mistakes accordingly.

Comment on lines 93 to 97
│ 1. Compile summary of ALL try-fix attempts so far: │
│ - Approach tried │
│ - Pass/Fail result │
│ - Key learnings (why it worked or failed) │
│ │
Copy link

Copilot AI Jan 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cross-pollination Step 1 requires compiling a summary of all try-fix attempts “so far” each loop iteration. This can quickly exceed Copilot CLI prompt limits (30,000 chars per .github/instructions/agents.instructions.md:27) and make the loop impractical. Add guidance to keep the shared summary bounded (e.g., fixed-size bullets per attempt) and/or add a maximum number of cross-pollination rounds/attempts.

Copilot uses AI. Check for mistakes.
- `try-fix` returns `exhausted=true` (no more ideas)
- 5 try-fix attempts have been made
- User requests to stop
**Exhaustion criteria**: The loop exits when ALL 5 models confirm they have no new ideas to try.
Copy link

Copilot AI Jan 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new exhaustion rule (“ALL 5 models confirm no new ideas”) conflicts with how try-fix defines and updates the state file’s Exhausted field (it’s set per-invocation based on that run’s judgment; see .github/skills/try-fix/SKILL.md Step 9-10). Either align Phase 4 to the existing per-invocation exhaustion semantics, or introduce a separate multi-model exhaustion tracker so the state file doesn’t get contradictory updates.

Suggested change
**Exhaustion criteria**: The loop exits when ALL 5 models confirm they have no new ideas to try.
**Coordination loop stop condition**: Exit when a full round completes and NO model proposes any new fix ideas. This is separate from the per-invocation `Exhausted` flag that each `try-fix` run sets in the state file.

Copilot uses AI. Check for mistakes.
- [ ] Each row has: approach, test result, files changed, notes
- [ ] "Exhausted" field set (Yes/No)
- [ ] "Exhausted" field set to Yes (all models confirmed no new ideas)
- [ ] "Selected Fix" populated with reasoning
Copy link

Copilot AI Jan 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Phase 4 completion checklist no longer calls out Root Cause Analysis being filled in, but other repo guidance/templates expect root-cause documentation (e.g., pr-finalize template includes a “### Root Cause” section in .github/skills/pr-finalize/SKILL.md:199+). Consider adding an explicit checklist item indicating where the root cause must be documented (Fix section vs Report section) to avoid leaving it out.

Suggested change
- [ ] "Selected Fix" populated with reasoning
- [ ] "Selected Fix" populated with reasoning
- [ ] Root cause analysis documented for the selected fix (to be surfaced in 📋 Report phase “### Root Cause” section)

Copilot uses AI. Check for mistakes.
Comment on lines 61 to 65
**Even when a PR already has a fix**, Phase 4 requires running the `try-fix` skill with **5 different AI models** to:
1. **Maximize fix diversity** - Each model brings different perspectives
2. **Cross-pollinate ideas** - Share results between models to spark new ideas
3. **Ensure exhaustive exploration** - Only stop when ALL models confirm "no new ideas"

Copy link

Copilot AI Jan 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Phase 4 is described as requiring 5 different AI models. However, repo agent guidelines for Copilot CLI explicitly state model is not supported (.github/instructions/agents.instructions.md:18-19), so this requirement can’t be followed as written. Please adjust Phase 4 guidance to work with Copilot CLI constraints (e.g., single-model workflow, or an optional/advanced section that describes a CLI-supported model-selection mechanism if available).

Copilot uses AI. Check for mistakes.
Comment on lines 63 to 67
| 1 | `claude-sonnet-4.5` | `task` agent with `model: "claude-sonnet-4.5"` |
| 2 | `claude-opus-4.5` | `task` agent with `model: "claude-opus-4.5"` |
| 3 | `gpt-5.2` | `task` agent with `model: "gpt-5.2"` |
| 4 | `gpt-5.2-codex` | `task` agent with `model: "gpt-5.2-codex"` |
| 5 | `gemini-3-pro-preview` | `task` agent with `model: "gemini-3-pro-preview"` |
Copy link

Copilot AI Jan 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The table instructs using model: "..." when invoking the task agent, but repo agent guidelines explicitly state model is VS Code-only and not supported in Copilot CLI (see .github/instructions/agents.instructions.md:18-19). This makes the Phase 4 instructions non-actionable in the intended environment. Remove model: from the invocation guidance, or document a Copilot-CLI-supported way to select models (if one exists) without relying on VS Code frontmatter fields.

Copilot uses AI. Check for mistakes.
Comment on lines 72 to 73
- Platform: [android/ios]
- TestFilter: "IssueXXXXX"
Copy link

Copilot AI Jan 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The suggested try-fix invocation prompt only provides Platform/TestFilter/state_file, but the try-fix skill documents required inputs as problem, test_command, target_files, and platform (see .github/skills/try-fix/references/example-invocation.md). Update the invocation template to include those required fields (and derive test_command from Platform/TestFilter if you want to keep TestFilter as a convenience).

Suggested change
- Platform: [android/ios]
- TestFilter: "IssueXXXXX"
- problem: Short description of the bug and expected behavior for PR #XXXXX (based on the linked issue and PR discussion)
- platform: [android/ios]
- test_filter: "IssueXXXXX" # Convenience filter name used to scope tests
- test_command: [INSERT test command here using the test_filter above, e.g., `dotnet test <TestProject>.csproj --filter "IssueXXXXX"`]
- target_files:
- [EDIT ME] src/<area>/<likely-affected-file-1>.cs
- [EDIT ME] src/<area>/<likely-affected-file-2>.cs

Copilot uses AI. Check for mistakes.
Comment on lines 76 to 77
Generate ONE independent fix idea and test it empirically.
Do NOT look at the PR's fix - generate ideas independently.
Copy link

Copilot AI Jan 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The prompt says “Do NOT look at the PR's fix”, but the try-fix skill requires reviewing existing PR changes first to ensure the attempted approach is different (see .github/skills/try-fix/SKILL.md “Core Principles”, item 3, and Step 1). Consider rephrasing to: generate an initial idea independently, then review the PR’s fix only to avoid duplicating it.

Copilot uses AI. Check for mistakes.
- Clarify model parameter is passed to task tool (not agent frontmatter)
- Add complete try-fix invocation fields: problem, test_command, target_files
- Fix 'Do NOT look at PR fix' to 'Review PR fix to ensure approach is DIFFERENT'
- Add summary size limit guidance for cross-pollination (30k char limit)
- Add MAX ROUNDS limit (3) to prevent infinite loops
- Separate coordination loop stop condition from per-invocation Exhausted flag
- Add root cause analysis checklist item in Phase 4 completion
@kubaflo kubaflo added the s/ai-reproduction-failed AI verified that tests pass without the fix, meaning they don't catch the bug label Jan 31, 2026
- Rule: Follow Templates EXACTLY - don't add attributes or modify format
- Rule: Use Skills' Scripts - Don't Bypass with manual commands
- These rules prevent issues with downstream script regex patterns
@kubaflo kubaflo removed the s/ai-reproduction-failed AI verified that tests pass without the fix, meaning they don't catch the bug label Jan 31, 2026
When environment setup is missing (Appium, devices, drivers, etc.),
the agent must STOP and ask the user before continuing. This prevents
marking phases as BLOCKED and continuing without actual verification.
- PLAN-TEMPLATE.md: Reusable 5-phase review plan with all critical rules
- Review-PR.ps1: Script to invoke Copilot CLI with the PR agent

Usage:
  pwsh .github/scripts/Review-PR.ps1 -PRNumber 33687
  pwsh .github/scripts/Review-PR.ps1 -PRNumber 33687 -Platform ios
@PureWeen PureWeen changed the title Update pr agent with multi-model try-fix workflow Enhance PR agent with multi-model workflow, environment blocker handling, and review script Jan 31, 2026
Changes:

1. Fix Review-PR.ps1 for interactive Copilot CLI
   - Copilot CLI is interactive (launched with 'copilot' command), not scriptable
   - Updated script to prepare environment and output context
   - Provides clear instructions for user to invoke Copilot CLI manually

2. Strengthen environment blocker handling with strict retry limits
   - Added explicit retry limits table:
     - Server errors (500, timeout): 0 retries - STOP immediately
     - Missing tools: 1 install attempt then STOP
     - Port conflicts: 1 kill attempt then STOP
     - WinAppDriver errors: 0 retries - STOP immediately
   - Added 'What I tried' section to blocker report template
   - New prohibitions:
     - Never keep troubleshooting after retry limit exceeded
     - Never spend more than 2-3 tool calls on same blocker
     - Never install things without asking between each
   - Updated PLAN-TEMPLATE.md with Rule 0 (highest priority)

These changes ensure the agent stops promptly on environment issues
rather than burning context on extended troubleshooting attempts.
Now that Copilot CLI supports command-line arguments:
- -i, --interactive <prompt>: Start interactive with prompt
- -p, --prompt <text>: Non-interactive mode
- --agent <agent>: Specify custom agent
- --allow-all: Enable all permissions

The script now:
- Validates prerequisites (gh CLI AND copilot CLI)
- Checks out PR branch (unless -SkipCheckout)
- Creates state directory
- Invokes Copilot CLI with the pr agent and a structured prompt

New parameters:
- -NoInteractive: Run in non-interactive mode (exits after completion)
- Default is interactive mode (-i) which keeps session open

Usage:
  # Interactive mode (default)
  pwsh .github/scripts/Review-PR.ps1 -PRNumber 33687

  # Non-interactive mode
  pwsh .github/scripts/Review-PR.ps1 -PRNumber 33687 -NoInteractive

  # Dry run to see what would be invoked
  pwsh .github/scripts/Review-PR.ps1 -PRNumber 33687 -DryRun
@PureWeen PureWeen changed the title Enhance PR agent with multi-model workflow, environment blocker handling, and review script Enhance PR agent with multi-model workflow, environment blocker handling, and review automation Jan 31, 2026
PureWeen and others added 8 commits January 31, 2026 15:45
Problem: During PR review, the pr agent was checking out branches
even when invoked with -SkipCheckout, causing:
- Loss of uncommitted changes (stashed)
- Confusion about which code is being reviewed
- Potential loss of work in progress

Solution: Added explicit rule prohibiting branch switching:
- Never run: git checkout, git switch, gh pr checkout, git stash
- Work on current branch as-is
- Use git diff or gh pr diff to see PR changes

Also fixed variable expansion in Review-PR.ps1 prompt (double backticks).
- Make Platform parameter optional - agent determines appropriate platform
- Add platform selection guidance to pr.md, post-gate.md, PLAN-TEMPLATE.md
- Enable streaming output (--stream on) for real-time feedback
- Add logging to CustomAgentLogsTmp/PRState/{PR}/copilot-logs/
- Save session markdown in non-interactive mode (--share)
- Block branch-switching commands (--deny-tool for git checkout/switch/stash)
- Key learning: Don't test on platforms not affected by the bug
- Add explicit warning that explore/glob is NOT valid for cross-pollination
- Require Cross-Pollination table with responses from ALL 5 models
- Add invocation template showing how to call each model
- Add validation that cross-pollination table must exist before marking complete
- Add new common mistakes for using shortcuts instead of model invocations
Cross-pollination exhaustion is determined by pr agent after
invoking ALL 5 models. try-fix should NOT set any global
exhausted field - it only reports its own attempt result.
- Merge PR into current branch instead of checkout
- Fail if on protected branches (main, release/*, net*.0)
- Handle fork PRs with separate fetch logic
- Replace SkipCheckout with SkipMerge parameter
- Create SHARED-RULES.md with extracted shared content (blocker handling,
  git rules, model list, platform selection)
- Simplify git rule: agent never runs git commands, always on correct branch
- Change 'commit state file' to 'save state file' (state files are gitignored)
- Phase 4: 'Apply changes' instead of 'Commit changes' - user handles git
- Compress cross-pollination section (51→20 lines)
- Convert PLAN-TEMPLATE.md to pure checklist (226→112 lines)
- Add instruction for user to handle commit/push when alternative fix applied

Line count reduction:
- pr.md: 662→535 (-19%)
- post-gate.md: 403→302 (-25%)
- PLAN-TEMPLATE.md: 226→112 (-50%)
- Total: 1291→1116 (-14%)

Validated by 5 AI models (Claude Sonnet 4.5, Claude Opus 4.5, GPT-5.2,
GPT-5.2-Codex, Gemini 3 Pro) - all confirmed no conflicting guidance.
@PureWeen PureWeen changed the title Enhance PR agent with multi-model workflow, environment blocker handling, and review automation Enhance PR agent: multi-model workflow, blocker handling, shared rules extraction Feb 2, 2026
@PureWeen PureWeen merged commit 1db8161 into main Feb 3, 2026
2 of 3 checks passed
@PureWeen PureWeen deleted the update-pr-agent-multi-model branch February 3, 2026 15:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area-ai-agents Copilot CLI agents, agent skills, AI-assisted development

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants