Add retry support for infrastructure errors in vf-eval by rasdani · Pull Request #750 · PrimeIntellect-ai/verifiers

rasdani · 2026-01-20T15:35:43Z

Description

Adds --max-retries CLI flag to retry rollouts when vf.InfraError occurs.
I've opted to only retry on vf.InfraError right now, as we don't assure yet that any vf.Error is retry-able.

Changes:

async_utils.py: Add maybe_retry() utility using tenacity with exponential backoff + jitter
types.py: Add max_retries to EvalConfig
eval.py: Add --max-retries CLI argument (default: 0)
eval_utils.py / environment.py: Thread parameter through to generate()

Behavior:

Only retries vf.InfraError (transient infra failures)
Does NOT retry ToolError, OverlongPromptError, etc.
Logs retries at WARNING level

Usage:
vf-eval my-env --max-retries 3

Type of Change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Documentation update
Test improvement

Testing

All existing tests pass when running uv run pytest locally.
New tests have been added to cover the changes

Checklist

My code follows the style guidelines of this project as outlined in AGENTS.md
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
Any dependent changes have been merged and published

Additional Notes

Note

Adds configurable retries for transient infrastructure failures.

Introduces maybe_retry() in utils/async_utils.py (tenacity-based exponential backoff + jitter) and applies it to Environment.generate() by wrapping run_rollout/run_group
Threads max_retries through Environment.generate/evaluate/evaluate_sync, EvalConfig (verifiers/types.py), and run_evaluation
Extends CLI with --max-retries in verifiers/scripts/eval.py; default 0
Updates docs (docs/evaluation.md, docs/reference.md) to document the flag and config fields
Adds tests for retry behavior and CLI defaults (tests/test_environment.py, tests/test_eval_cli.py)

^{Written by Cursor Bugbot for commit adc524a. This will update automatically on new commits. Configure here.}

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.}

verifiers/types.py

mikasenghaas

nice! this looks clean, 2 minor comments

verifiers/utils/async_utils.py

rasdani added 3 commits January 20, 2026 14:54

Add retry support for infrastructure errors in vf-eval

324cde2

update docs

db4686d

update tests

7dca89e

This comment was marked as outdated.

Sign in to view

fix ty type errors

8fb786c

cursor bot reviewed Jan 20, 2026

View reviewed changes

verifiers/types.py Show resolved Hide resolved

rasdani added 4 commits January 20, 2026 15:56

fix

dfdbcd7

add test case

dc4b194

update docs

3f227eb

fix ty type errors

1cd8233

rasdani requested review from mikasenghaas and willccbb January 20, 2026 16:05

mikasenghaas reviewed Jan 20, 2026

View reviewed changes

verifiers/utils/async_utils.py Show resolved Hide resolved

verifiers/utils/async_utils.py Outdated Show resolved Hide resolved

move logging into maybe_retry

adc524a

rasdani requested a review from mikasenghaas January 20, 2026 20:34

mikasenghaas approved these changes Jan 20, 2026

View reviewed changes

willccbb merged commit beb02ec into main Jan 20, 2026
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Add retry support for infrastructure errors in vf-eval#750

Add retry support for infrastructure errors in vf-eval#750
willccbb merged 9 commits intomainfrom
daniel/vf-eval-retry

rasdani commented Jan 20, 2026 •

edited by cursor bot

Loading

Uh oh!

This comment was marked as outdated.

Uh oh!

cursor bot left a comment

Uh oh!

Uh oh!

mikasenghaas left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

Conversation

rasdani commented Jan 20, 2026 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of Change

Testing

Checklist

Additional Notes

Uh oh!

This comment was marked as outdated.

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mikasenghaas left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

rasdani commented Jan 20, 2026 •

edited by cursor bot

Loading