-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Enhance PR agent: multi-model workflow, blocker handling, shared rules extraction #33813
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Phase 4 now uses exhaustive multi-model exploration: Round 1: Run try-fix with 5 models sequentially: - claude-sonnet-4.5 - claude-opus-4.5 - gpt-5.2 - gpt-5.2-codex - gemini-3-pro-preview Round 2+: Cross-pollination loop: - Share all results with all 5 models - Ask for NEW ideas based on learnings - Run try-fix for each new idea - Repeat until all models confirm 'no new ideas' Key constraints: - SEQUENTIAL ONLY (same files/device) - Exhaustion = all 5 models confirm no new ideas - Never skip models in Round 1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Updates the PR agent’s Phase 4 (Fix exploration) documentation to describe a multi-model, cross-pollinated try-fix workflow intended to expand fix diversity and ensure exhaustive alternative exploration.
Changes:
- Replaces the Phase 4 Step 1
try-fixloop description with a 2-stage (Round 1 + cross-pollination) multi-model process. - Updates Phase 4 completion criteria and common mistakes to reflect the new multi-model sequencing requirements.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 7 comments.
| File | Description |
|---|---|
.github/agents/pr.md |
Updates Phase 4 overview to require a multi-model try-fix workflow and references the detailed post-gate instructions. |
.github/agents/pr/post-gate.md |
Rewrites Phase 4 Step 1 into a multi-model Round 1 + cross-pollination loop and updates completion checklist/mistakes accordingly. |
.github/agents/pr/post-gate.md
Outdated
| │ 1. Compile summary of ALL try-fix attempts so far: │ | ||
| │ - Approach tried │ | ||
| │ - Pass/Fail result │ | ||
| │ - Key learnings (why it worked or failed) │ | ||
| │ │ |
Copilot
AI
Jan 31, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cross-pollination Step 1 requires compiling a summary of all try-fix attempts “so far” each loop iteration. This can quickly exceed Copilot CLI prompt limits (30,000 chars per .github/instructions/agents.instructions.md:27) and make the loop impractical. Add guidance to keep the shared summary bounded (e.g., fixed-size bullets per attempt) and/or add a maximum number of cross-pollination rounds/attempts.
.github/agents/pr/post-gate.md
Outdated
| - `try-fix` returns `exhausted=true` (no more ideas) | ||
| - 5 try-fix attempts have been made | ||
| - User requests to stop | ||
| **Exhaustion criteria**: The loop exits when ALL 5 models confirm they have no new ideas to try. |
Copilot
AI
Jan 31, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new exhaustion rule (“ALL 5 models confirm no new ideas”) conflicts with how try-fix defines and updates the state file’s Exhausted field (it’s set per-invocation based on that run’s judgment; see .github/skills/try-fix/SKILL.md Step 9-10). Either align Phase 4 to the existing per-invocation exhaustion semantics, or introduce a separate multi-model exhaustion tracker so the state file doesn’t get contradictory updates.
| **Exhaustion criteria**: The loop exits when ALL 5 models confirm they have no new ideas to try. | |
| **Coordination loop stop condition**: Exit when a full round completes and NO model proposes any new fix ideas. This is separate from the per-invocation `Exhausted` flag that each `try-fix` run sets in the state file. |
| - [ ] Each row has: approach, test result, files changed, notes | ||
| - [ ] "Exhausted" field set (Yes/No) | ||
| - [ ] "Exhausted" field set to Yes (all models confirmed no new ideas) | ||
| - [ ] "Selected Fix" populated with reasoning |
Copilot
AI
Jan 31, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Phase 4 completion checklist no longer calls out Root Cause Analysis being filled in, but other repo guidance/templates expect root-cause documentation (e.g., pr-finalize template includes a “### Root Cause” section in .github/skills/pr-finalize/SKILL.md:199+). Consider adding an explicit checklist item indicating where the root cause must be documented (Fix section vs Report section) to avoid leaving it out.
| - [ ] "Selected Fix" populated with reasoning | |
| - [ ] "Selected Fix" populated with reasoning | |
| - [ ] Root cause analysis documented for the selected fix (to be surfaced in 📋 Report phase “### Root Cause” section) |
.github/agents/pr.md
Outdated
| **Even when a PR already has a fix**, Phase 4 requires running the `try-fix` skill with **5 different AI models** to: | ||
| 1. **Maximize fix diversity** - Each model brings different perspectives | ||
| 2. **Cross-pollinate ideas** - Share results between models to spark new ideas | ||
| 3. **Ensure exhaustive exploration** - Only stop when ALL models confirm "no new ideas" | ||
|
|
Copilot
AI
Jan 31, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Phase 4 is described as requiring 5 different AI models. However, repo agent guidelines for Copilot CLI explicitly state model is not supported (.github/instructions/agents.instructions.md:18-19), so this requirement can’t be followed as written. Please adjust Phase 4 guidance to work with Copilot CLI constraints (e.g., single-model workflow, or an optional/advanced section that describes a CLI-supported model-selection mechanism if available).
.github/agents/pr/post-gate.md
Outdated
| | 1 | `claude-sonnet-4.5` | `task` agent with `model: "claude-sonnet-4.5"` | | ||
| | 2 | `claude-opus-4.5` | `task` agent with `model: "claude-opus-4.5"` | | ||
| | 3 | `gpt-5.2` | `task` agent with `model: "gpt-5.2"` | | ||
| | 4 | `gpt-5.2-codex` | `task` agent with `model: "gpt-5.2-codex"` | | ||
| | 5 | `gemini-3-pro-preview` | `task` agent with `model: "gemini-3-pro-preview"` | |
Copilot
AI
Jan 31, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The table instructs using model: "..." when invoking the task agent, but repo agent guidelines explicitly state model is VS Code-only and not supported in Copilot CLI (see .github/instructions/agents.instructions.md:18-19). This makes the Phase 4 instructions non-actionable in the intended environment. Remove model: from the invocation guidance, or document a Copilot-CLI-supported way to select models (if one exists) without relying on VS Code frontmatter fields.
.github/agents/pr/post-gate.md
Outdated
| - Platform: [android/ios] | ||
| - TestFilter: "IssueXXXXX" |
Copilot
AI
Jan 31, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The suggested try-fix invocation prompt only provides Platform/TestFilter/state_file, but the try-fix skill documents required inputs as problem, test_command, target_files, and platform (see .github/skills/try-fix/references/example-invocation.md). Update the invocation template to include those required fields (and derive test_command from Platform/TestFilter if you want to keep TestFilter as a convenience).
| - Platform: [android/ios] | |
| - TestFilter: "IssueXXXXX" | |
| - problem: Short description of the bug and expected behavior for PR #XXXXX (based on the linked issue and PR discussion) | |
| - platform: [android/ios] | |
| - test_filter: "IssueXXXXX" # Convenience filter name used to scope tests | |
| - test_command: [INSERT test command here using the test_filter above, e.g., `dotnet test <TestProject>.csproj --filter "IssueXXXXX"`] | |
| - target_files: | |
| - [EDIT ME] src/<area>/<likely-affected-file-1>.cs | |
| - [EDIT ME] src/<area>/<likely-affected-file-2>.cs |
.github/agents/pr/post-gate.md
Outdated
| Generate ONE independent fix idea and test it empirically. | ||
| Do NOT look at the PR's fix - generate ideas independently. |
Copilot
AI
Jan 31, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The prompt says “Do NOT look at the PR's fix”, but the try-fix skill requires reviewing existing PR changes first to ensure the attempted approach is different (see .github/skills/try-fix/SKILL.md “Core Principles”, item 3, and Step 1). Consider rephrasing to: generate an initial idea independently, then review the PR’s fix only to avoid duplicating it.
- Clarify model parameter is passed to task tool (not agent frontmatter) - Add complete try-fix invocation fields: problem, test_command, target_files - Fix 'Do NOT look at PR fix' to 'Review PR fix to ensure approach is DIFFERENT' - Add summary size limit guidance for cross-pollination (30k char limit) - Add MAX ROUNDS limit (3) to prevent infinite loops - Separate coordination loop stop condition from per-invocation Exhausted flag - Add root cause analysis checklist item in Phase 4 completion
- Rule: Follow Templates EXACTLY - don't add attributes or modify format - Rule: Use Skills' Scripts - Don't Bypass with manual commands - These rules prevent issues with downstream script regex patterns
When environment setup is missing (Appium, devices, drivers, etc.), the agent must STOP and ask the user before continuing. This prevents marking phases as BLOCKED and continuing without actual verification.
- PLAN-TEMPLATE.md: Reusable 5-phase review plan with all critical rules - Review-PR.ps1: Script to invoke Copilot CLI with the PR agent Usage: pwsh .github/scripts/Review-PR.ps1 -PRNumber 33687 pwsh .github/scripts/Review-PR.ps1 -PRNumber 33687 -Platform ios
Changes:
1. Fix Review-PR.ps1 for interactive Copilot CLI
- Copilot CLI is interactive (launched with 'copilot' command), not scriptable
- Updated script to prepare environment and output context
- Provides clear instructions for user to invoke Copilot CLI manually
2. Strengthen environment blocker handling with strict retry limits
- Added explicit retry limits table:
- Server errors (500, timeout): 0 retries - STOP immediately
- Missing tools: 1 install attempt then STOP
- Port conflicts: 1 kill attempt then STOP
- WinAppDriver errors: 0 retries - STOP immediately
- Added 'What I tried' section to blocker report template
- New prohibitions:
- Never keep troubleshooting after retry limit exceeded
- Never spend more than 2-3 tool calls on same blocker
- Never install things without asking between each
- Updated PLAN-TEMPLATE.md with Rule 0 (highest priority)
These changes ensure the agent stops promptly on environment issues
rather than burning context on extended troubleshooting attempts.
Now that Copilot CLI supports command-line arguments: - -i, --interactive <prompt>: Start interactive with prompt - -p, --prompt <text>: Non-interactive mode - --agent <agent>: Specify custom agent - --allow-all: Enable all permissions The script now: - Validates prerequisites (gh CLI AND copilot CLI) - Checks out PR branch (unless -SkipCheckout) - Creates state directory - Invokes Copilot CLI with the pr agent and a structured prompt New parameters: - -NoInteractive: Run in non-interactive mode (exits after completion) - Default is interactive mode (-i) which keeps session open Usage: # Interactive mode (default) pwsh .github/scripts/Review-PR.ps1 -PRNumber 33687 # Non-interactive mode pwsh .github/scripts/Review-PR.ps1 -PRNumber 33687 -NoInteractive # Dry run to see what would be invoked pwsh .github/scripts/Review-PR.ps1 -PRNumber 33687 -DryRun
Problem: During PR review, the pr agent was checking out branches even when invoked with -SkipCheckout, causing: - Loss of uncommitted changes (stashed) - Confusion about which code is being reviewed - Potential loss of work in progress Solution: Added explicit rule prohibiting branch switching: - Never run: git checkout, git switch, gh pr checkout, git stash - Work on current branch as-is - Use git diff or gh pr diff to see PR changes Also fixed variable expansion in Review-PR.ps1 prompt (double backticks).
- Make Platform parameter optional - agent determines appropriate platform
- Add platform selection guidance to pr.md, post-gate.md, PLAN-TEMPLATE.md
- Enable streaming output (--stream on) for real-time feedback
- Add logging to CustomAgentLogsTmp/PRState/{PR}/copilot-logs/
- Save session markdown in non-interactive mode (--share)
- Block branch-switching commands (--deny-tool for git checkout/switch/stash)
- Key learning: Don't test on platforms not affected by the bug
- Add explicit warning that explore/glob is NOT valid for cross-pollination - Require Cross-Pollination table with responses from ALL 5 models - Add invocation template showing how to call each model - Add validation that cross-pollination table must exist before marking complete - Add new common mistakes for using shortcuts instead of model invocations
Cross-pollination exhaustion is determined by pr agent after invoking ALL 5 models. try-fix should NOT set any global exhausted field - it only reports its own attempt result.
- Merge PR into current branch instead of checkout - Fail if on protected branches (main, release/*, net*.0) - Handle fork PRs with separate fetch logic - Replace SkipCheckout with SkipMerge parameter
- Create SHARED-RULES.md with extracted shared content (blocker handling, git rules, model list, platform selection) - Simplify git rule: agent never runs git commands, always on correct branch - Change 'commit state file' to 'save state file' (state files are gitignored) - Phase 4: 'Apply changes' instead of 'Commit changes' - user handles git - Compress cross-pollination section (51→20 lines) - Convert PLAN-TEMPLATE.md to pure checklist (226→112 lines) - Add instruction for user to handle commit/push when alternative fix applied Line count reduction: - pr.md: 662→535 (-19%) - post-gate.md: 403→302 (-25%) - PLAN-TEMPLATE.md: 226→112 (-50%) - Total: 1291→1116 (-14%) Validated by 5 AI models (Claude Sonnet 4.5, Claude Opus 4.5, GPT-5.2, GPT-5.2-Codex, Gemini 3 Pro) - all confirmed no conflicting guidance.
Note
Are you waiting for the changes in this PR to be merged?
It would be very helpful if you could test the resulting artifacts from this PR and let us know in a comment if this change resolves your issue. Thank you!
Summary
This PR significantly enhances the PR agent workflow with improvements across two development phases:
Phase 1: Original Enhancements (Commits 1-8)
Phase 2: Consolidation & Simplification (Commit 9)
After multi-model review (5 models: Claude Sonnet 4.5, Claude Opus 4.5, GPT-5.2, GPT-5.2-Codex, Gemini 3 Pro), the following improvements were made:
SHARED-RULES.mdto eliminate duplication across filesCommits
1.
80d7e412c2- Update pr agent with multi-model try-fix workflowWhy: The original PR agent only used a single model for exploring fixes. Different AI models have different strengths and may find solutions others miss.
Changes:
claude-sonnet-4.5,claude-opus-4.5,gpt-5.2,gpt-5.2-codex,gemini-3-pro-preview2.
69cc6af403- Address Copilot review suggestionsWhy: Initial PR review feedback suggested improvements.
Changes:
3.
fe55c3fd21- Add rules for template formatting and skill script usageWhy: Downstream scripts depend on exact regex patterns in state files. Agents were "improving" templates by adding attributes like
openwhich broke parsing.Changes:
4.
debbee608e- Add 'Stop on Environment Blockers' rule to PR agentWhy: Agent was continuing through phases when environment issues (missing Appium, WinAppDriver errors) prevented completion, leading to incomplete reviews.
Changes:
5.
ad29f6a796- Add PR review plan template and Review-PR.ps1 scriptWhy: Need a reusable template for consistent PR reviews and a script to automate invocation.
Changes:
.github/agents/pr/PLAN-TEMPLATE.md- Reusable 5-phase review plan.github/scripts/Review-PR.ps1- Script to prepare environment and invoke Copilot CLI6.
886ea2aa8e- Improve blocker handling and fix Review-PR.ps1 for Copilot CLIWhy: During PR #27300 review, agent spent 10+ tool calls troubleshooting WinAppDriver instead of stopping after first failure.
Changes:
7.
d67da75e85- Update Review-PR.ps1 to invoke Copilot CLI directlyWhy: Initially thought Copilot CLI was interactive-only. Discovered it supports
-i <prompt>and-p <prompt>for programmatic invocation.Changes:
copilot --agent pr -i "<prompt>"directlyghCLI andcopilotCLI are installed-NoInteractiveswitch for-pmode (exits after completion)8.
ed74c574a5- Add 'Do NOT Switch Branches' rule to pr agentWhy: During PR review testing, the pr agent ran
git checkout,git stash, and other branch-switching commands, causing loss of local changes and confusion about which code was being reviewed.Changes:
git checkout,git switch,gh pr checkout,git stashgit difforgh pr diffto see PR changes9.
632bfb7155- Extract shared rules, simplify git policy, reduce duplicationWhy: Multi-model review (5 AI models) identified significant duplication (~200 lines) across files, conflicting "commit" terminology, and overly verbose sections. The git checkout prohibition also conflicted with workflow steps that mentioned git checkout.
Changes:
Created
SHARED-RULES.md(167 lines) - Single source of truth for:Simplified git policy - Agent is ALWAYS on correct branch, never runs git commands:
Compressed content:
Eliminated duplication:
Files Changed
.github/agents/pr.md.github/agents/pr/post-gate.md.github/agents/pr/PLAN-TEMPLATE.md.github/agents/pr/SHARED-RULES.md.github/scripts/Review-PR.ps1Usage
Multi-Model Validation
The final changes (commit 9) were validated by 5 AI models:
Testing
Tested by reviewing PR #27300 (ScrollView ScrollToAsync fix):