Skip to content

Comments

Add retry support for infrastructure errors in vf-eval#750

Merged
willccbb merged 9 commits intomainfrom
daniel/vf-eval-retry
Jan 20, 2026
Merged

Add retry support for infrastructure errors in vf-eval#750
willccbb merged 9 commits intomainfrom
daniel/vf-eval-retry

Conversation

@rasdani
Copy link
Contributor

@rasdani rasdani commented Jan 20, 2026

Description

Adds --max-retries CLI flag to retry rollouts when vf.InfraError occurs.
I've opted to only retry on vf.InfraError right now, as we don't assure yet that any vf.Error is retry-able.

Changes:

  • async_utils.py: Add maybe_retry() utility using tenacity with exponential backoff + jitter
  • types.py: Add max_retries to EvalConfig
  • eval.py: Add --max-retries CLI argument (default: 0)
  • eval_utils.py / environment.py: Thread parameter through to generate()

Behavior:

  • Only retries vf.InfraError (transient infra failures)
  • Does NOT retry ToolError, OverlongPromptError, etc.
  • Logs retries at WARNING level

Usage:
vf-eval my-env --max-retries 3

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Test improvement

Testing

  • All existing tests pass when running uv run pytest locally.
  • New tests have been added to cover the changes

Checklist

  • My code follows the style guidelines of this project as outlined in AGENTS.md
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • Any dependent changes have been merged and published

Additional Notes


Note

Adds configurable retries for transient infrastructure failures.

  • Introduces maybe_retry() in utils/async_utils.py (tenacity-based exponential backoff + jitter) and applies it to Environment.generate() by wrapping run_rollout/run_group
  • Threads max_retries through Environment.generate/evaluate/evaluate_sync, EvalConfig (verifiers/types.py), and run_evaluation
  • Extends CLI with --max-retries in verifiers/scripts/eval.py; default 0
  • Updates docs (docs/evaluation.md, docs/reference.md) to document the flag and config fields
  • Adds tests for retry behavior and CLI defaults (tests/test_environment.py, tests/test_eval_cli.py)

Written by Cursor Bugbot for commit adc524a. This will update automatically on new commits. Configure here.

cursor[bot]

This comment was marked as outdated.

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

Copy link
Member

@mikasenghaas mikasenghaas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice! this looks clean, 2 minor comments

@rasdani rasdani requested a review from mikasenghaas January 20, 2026 20:34
@willccbb willccbb merged commit beb02ec into main Jan 20, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants