Skip to content

Look into timeout issues possibly caused by infinite loops#54

Merged
derekmisler merged 1 commit intodocker:mainfrom
derekmisler:look-into-timeout-issues-possibly-caused-by-infini
Feb 26, 2026
Merged

Look into timeout issues possibly caused by infinite loops#54
derekmisler merged 1 commit intodocker:mainfrom
derekmisler:look-into-timeout-issues-possibly-caused-by-infini

Conversation

@derekmisler
Copy link
Contributor

@derekmisler derekmisler commented Feb 25, 2026

Summary

This PR addresses timeout failures in the review-pr action by eliminating two root causes: an infinite loop risk where the agent could repeatedly retry verifier delegations, and slow/failing diff fetches inside the agent container (especially on forked PRs). The verifier is also refactored from single-finding to batch mode, reducing the number of LLM delegation calls per review.

Changes

  • review-pr/action.yml: Pre-fetches the PR diff via gh pr diff before the agent starts and writes it to pr.diff, avoiding 403 errors on forked PRs inside the agent container. Also doubles the action timeout from 20 min (1200s) to 40 min (2400s) to accommodate large diffs and retries.
  • review-pr/agents/pr-review.yaml (orchestrator): Adds an explicit ANTI-LOOP RULE — the agent must delegate to the verifier exactly once with all high/medium findings batched together. If the verifier returns an empty or malformed response, all findings are treated as DISMISSED rather than retried. Also updates diff-fetching instructions to check for the pre-fetched pr.diff file first.
  • review-pr/agents/pr-review.yaml (verifier agent): Refactored from single-finding to batch mode. The structured output schema changes from a single verdict object to a verdicts array, with additionalProperties: false enforced at both the array item and root levels.
  • review-pr/agents/evals/batched-verifier-security-{1,2,3}.json: Three new eval fixtures (triplicate runs for reliability) that assert the agent uses a single batch delegation to the verifier and correctly identifies a missing redirect_uri validation as a high-severity security finding.

How to Test

  • Trigger the review-pr action on a PR from a fork and confirm the diff is pre-fetched successfully (look for the ✅ Pre-fetched PR diff log line) and the agent reads from pr.diff instead of calling gh pr diff.
  • Run the new eval fixtures (batched-verifier-security-1/2/3.json) and verify all relevance criteria pass — particularly that the verifier is called exactly once with a batched payload and returns a verdicts array.
  • Confirm that a review run on a large PR no longer times out within the previous 20-minute window, and that a malformed verifier response results in an APPROVE rather than a retry loop.

Closes: https://github.com/docker/gordon/issues/179

Signed-off-by: Derek Misler <derek.misler@docker.com>
@derekmisler
Copy link
Contributor Author

/describe

@docker-agent
Copy link

docker-agent bot commented Feb 25, 2026

✅ PR description has been generated and updated!

@derekmisler derekmisler marked this pull request as ready for review February 25, 2026 20:53
@derekmisler derekmisler requested a review from a team as a code owner February 25, 2026 20:53
Copy link

@docker-agent docker-agent bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Review Complete

Summary

No issues found. This PR looks good!

The changes correctly implement:

  • Pre-fetching of PR diffs to avoid 403 errors on forked PRs
  • Timeout extension from 20 to 40 minutes
  • Batch verification mode to prevent infinite retry loops
  • Proper structured output schema with verdicts array
  • Three eval fixtures to validate the new behavior

All changed code is correct — no logic errors, resource leaks, or security issues detected.

@derekmisler derekmisler merged commit 4bd3cbd into docker:main Feb 26, 2026
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants