feat(retry): connect retry configuration to maybe_retry calls by gspeter-max · Pull Request #936 · PrimeIntellect-ai/verifiers

gspeter-max · 2026-02-19T07:21:17Z

Summary

Connects retry configuration timing parameters (retry_base_delay, retry_max_backoff, retry_jitter) to the actual maybe_retry calls throughout the codebase.

Problem

Part 1 (already merged) added retry configuration fields to ClientConfig and updated maybe_retry to support configurable timing parameters. However, these fields were NOT connected to the actual maybe_retry calls, so hardcoded defaults were always being used instead of the configured values.

Solution

This PR completes the implementation by:

EvalConfig - Added retry timing fields with defaults matching maybe_retry:
- retry_base_delay: float = 1.0 (initial delay)
- retry_max_backoff: float = 60.0 (maximum wait time)
- retry_jitter: bool = True (jitter control)
CLI Arguments - Added three new CLI flags:
- --retry-base-delay
- --retry-max-backoff
- --retry-jitter
Complete Data Flow - Connected retry parameters through:
- EvalConfig → run_evaluation() → evaluate() → generate() → run_rollout()/run_group() → maybe_retry()
- Server mode: EnvClient → Request types → EnvServer handlers

Changes

verifiers/types.py - Added retry timing fields to EvalConfig
verifiers/scripts/eval.py - Added CLI arguments and updated build_eval_config()
verifiers/envs/environment.py - Updated run_rollout, run_group, generate, evaluate methods
verifiers/envs/env_group.py - Updated wrapper methods to pass retry parameters
verifiers/workers/client/env_client.py - Updated client interface
verifiers/workers/types.py - Added retry fields to request types
verifiers/workers/server/env_server.py - Updated server handlers
verifiers/utils/eval_utils.py - Updated evaluation orchestration
tests/test_rate_limit_retry.py - Added integration tests

Test Results

All tests passing:

✅ tests/test_rate_limit_retry.py - 7/7 passed
✅ tests/test_environment.py - 30/30 passed
✅ tests/test_env_group.py - 16/16 passed

Backward Compatibility

✅ Fully backward compatible - all new parameters have defaults matching previous hardcoded behavior

Example Usage

# CLI usage
prime eval gsm8k \
  --max-retries 3 \
  --retry-base-delay 2.0 \
  --retry-max-backoff 30.0 \
  --retry-jitter false

# TOML config
retry_base_delay = 2.0
retry_max_backoff = 30.0
retry_jitter = false

References

Closes #577

Note

Medium Risk
Touches core evaluation execution paths and server RPC request schemas, so mis-threaded defaults or mismatched client/server versions could change retry behavior under load; changes are otherwise additive with sensible defaults.

Overview
Adds configurable retry backoff plumbing end-to-end. Environment.run_rollout/run_group now pass retry_base_delay, retry_max_backoff, and retry_jitter into maybe_retry, and these parameters are threaded through EnvGroup, generate/evaluate, and the worker client/server request types so server mode honors the same settings.

Expands what gets retried and how rate limits are surfaced. Introduces verifiers.errors.RateLimitError, maps OpenAI/Anthropic/judge-model 429s to it, and updates maybe_retry to retry RateLimitError by default with optional jitter disabling. CLI support is added via --retry-base-delay/--retry-max-backoff/--retry-jitter, and new tests cover rate-limit retry behavior and config defaults.

^{Written by Cursor Bugbot for commit 4177c39. This will update automatically on new commits. Configure here.}

…f for rate limits Root Cause: - HTTP 429 rate limit errors from API models caused failures instead of automatic retries - The maybe_retry function only retried InfraError and InvalidModelResponseError - No configurable retry parameters (base delay, max backoff, jitter) existed for rate limit handling Changes: 1. Added RateLimitError class to verifiers/errors.py - New error type for unified rate limit handling across providers (OpenAI, Anthropic) 2. Added retry configuration to ClientConfig and EndpointClientConfig in verifiers/types.py - retry_base_delay: float = 1.0 (Initial delay in seconds) - retry_max_backoff: float = 60.0 (Maximum backoff in seconds) - retry_jitter: bool = True (Add random jitter to avoid thundering herd) 3. Modified maybe_retry function in verifiers/utils/async_utils.py - Added jitter parameter to function signature - Added RateLimitError to default error_types tuple - Updated docstring to document new parameters - Made jitter configurable with conditional tenacity wait strategy 4. Updated client implementations to catch and convert rate limit errors - openai_chat_completions_client.py: Catches OpenAI RateLimitError, converts to vf.RateLimitError - anthropic_messages_client.py: Checks for HTTP 429 status code, converts to vf.RateLimitError - openai_completions_client.py: Inherits rate limit handling from shared decorator 5. Updated judge_rubric.py - Replaced RuntimeError with VFRateLimitError for judge model rate limits - Allows callers to benefit from retry mechanism 6. Created comprehensive tests in tests/test_rate_limit_retry.py - Tests rate limit error retry behavior - Tests retry exhaustion returns error in state - Tests max_retries=0 disables retry - Tests jitter configuration - Tests multiple error types in retry Impact: - Rate limit errors (HTTP 429) now trigger automatic retries with exponential backoff - Configurable retry parameters provide granular control over retry behavior - Fully backward compatible - defaults match existing behavior - Jitter helps prevent thundering herd problem when multiple requests hit rate limits simultaneously Refs: PrimeIntellect-ai#577

…try behavior Root Cause: - Test was returning error objects directly instead of wrapped in dicts - maybe_retry expects errors in state["error"] dict format Changes: - Fixed test_multiple_error_types_in_retry to return proper dict format - Changed from returning bare error objects to {"error": error} format Impact: - Test now correctly matches how maybe_retry works with state dictionaries - All 5 tests in test_rate_limit_retry.py pass Refs: PrimeIntellect-ai#577

Root Cause: - Part 1 added retry config fields (retry_base_delay, retry_max_backoff, retry_jitter) to ClientConfig - Part 1 updated maybe_retry to support configurable timing parameters - These fields were NOT connected to the actual maybe_retry calls in run_rollout and run_group - Hardcoded defaults were being used instead of the configured values Changes: 1. EvalConfig - Added retry timing fields (retry_base_delay, retry_max_backoff, retry_jitter) - Defaults match maybe_retry: 1.0, 60.0, True 2. CLI - Added --retry-base-delay, --retry-max-backoff, --retry-jitter arguments 3. build_eval_config - Updated to construct EvalConfig with retry timing fields 4. Environment.run_rollout - Added retry timing parameters and passed to maybe_retry 5. Environment.run_group - Added retry timing parameters and passed to maybe_retry 6. Environment.generate - Added retry timing parameters and passed to run_rollout/run_group 7. Environment.evaluate - Added retry timing parameters and passed to generate 8. EnvClient.run_rollout - Added retry timing parameters and passed to request 9. EnvClient.run_group - Added retry timing parameters and passed to request 10. RunRolloutRequest - Added retry timing fields 11. RunGroupRequest - Added retry timing fields 12. EnvServer.handle_run_rollout - Passed retry timing to env.run_rollout 13. EnvServer.handle_run_group - Passed retry timing to env.run_group 14. eval_utils.run_evaluation - Passed config retry timing to vf_env.evaluate 15. Tests - Added tests for EvalConfig retry timing parameters Impact: - Users can now configure retry timing via CLI or TOML config - Custom retry base delay, max backoff, and jitter settings are properly propagated - Backward compatible - defaults match previous hardcoded behavior Refs: PrimeIntellect-ai#577 Files Modified: - verifiers/types.py - verifiers/scripts/eval.py - verifiers/envs/environment.py - verifiers/workers/client/env_client.py - verifiers/workers/types.py - verifiers/workers/server/env_server.py - verifiers/utils/eval_utils.py - tests/test_rate_limit_retry.py

CLAassistant · 2026-02-19T07:21:25Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

cursor

Cursor Bugbot has reviewed your changes and found 6 potential issues.

^{Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.}

cursor · 2026-02-19T07:32:14Z

verifiers/scripts/eval.py

            max_retries=raw.get("max_retries", 0),
+            retry_base_delay=raw.get("retry_base_delay", 1.0),
+            retry_max_backoff=raw.get("retry_max_backoff", 60.0),
+            retry_jitter=raw.get("retry_jitter", True),


TOML config rejects new retry fields as invalid

High Severity

build_eval_config reads retry_base_delay, retry_max_backoff, and retry_jitter from the raw dict via raw.get(...), and the PR description shows TOML config as a supported use case. However, load_toml_config in verifiers/utils/eval_utils.py has a valid_fields allowlist that does NOT include these three fields. Any TOML config specifying them will raise a ValueError at validation time, making the TOML config path for retry parameters completely broken.

Additional Locations (1)

verifiers/utils/eval_utils.py#L638-L641

cursor · 2026-02-19T07:32:14Z

verifiers/clients/anthropic_messages_client.py

        except BadRequestError as e:
+            # Check for HTTP 429 rate limit
+            if hasattr(e, "response") and hasattr(e.response, "status_code") and e.response.status_code == 429:
+                raise VFRateLimitError(f"Anthropic rate limit: {e}") from e


Anthropic rate limit check is unreachable dead code

High Severity

The Anthropic SDK raises anthropic.RateLimitError (not BadRequestError) for HTTP 429 responses. The new check for e.response.status_code == 429 inside except BadRequestError is unreachable — a BadRequestError will never have a 429 status code. Anthropic rate limit errors are not imported or caught, so they fall through to the generic except Exception in client.py and become a ModelError, bypassing the retry mechanism entirely. The OpenAI client correctly imports and catches a separate RateLimitError exception.

cursor · 2026-02-19T07:32:14Z

verifiers/utils/async_utils.py

+        wait=(
+            tc.wait_exponential_jitter(initial=initial, max=max_wait)
+            if jitter
+            else tc.wait_exponential(multiplier=1, min=initial, max=max_wait)


Non-jitter backoff ignores base delay as scaling factor

Medium Severity

When jitter=False, wait_exponential(multiplier=1, min=initial, max=max_wait) uses initial as a floor, not as the exponential base. With retry_base_delay=5.0, the sequence would be 5, 5, 5, 8, 16… instead of the expected 5, 10, 20, 40… The multiplier parameter needs to be initial (not 1) to match the scaling behavior of wait_exponential_jitter(initial=initial).

cursor · 2026-02-19T07:32:14Z

verifiers/scripts/eval.py

+        type=lambda x: x.lower() in ("true", "1", "yes") if x is not None else None,
+        default=None,
+        help="Enable jitter to avoid thundering herd (default: True)",
+    )


Missing documentation updates for new retry CLI flags

Medium Severity

This PR adds three new CLI flags (--retry-base-delay, --retry-max-backoff, --retry-jitter) and three new EvalConfig fields, but neither docs/evaluation.md (which has a CLI flags table including --max-retries) nor docs/reference.md (which lists EvalConfig fields) are updated to reflect these additions.

^{Triggered by project rule: BugBot Instructions}

cursor · 2026-02-19T07:32:14Z

verifiers/scripts/eval.py

+        type=lambda x: x.lower() in ("true", "1", "yes") if x is not None else None,
+        default=None,
+        help="Enable jitter to avoid thundering herd (default: True)",
+    )


Missing skills update for new eval CLI flags

Low Severity

This PR modifies verifiers/scripts/eval.py with new CLI arguments affecting evaluation workflow behavior. The skills/evaluate-environments/SKILL.md file, which documents evaluation patterns and CLI usage, was not updated to reflect the new retry configuration options.

^{Triggered by project rule: BugBot Instructions}

cursor · 2026-02-19T07:32:14Z

verifiers/scripts/eval.py

+        type=lambda x: x.lower() in ("true", "1", "yes") if x is not None else None,
+        default=None,
+        help="Enable jitter to avoid thundering herd (default: True)",
+    )


CLI retry defaults are None, causing Pydantic validation error

High Severity

The three new CLI arguments (--retry-base-delay, --retry-max-backoff, --retry-jitter) use default=None. When unspecified, vars(args) includes these keys with value None. Then raw.get("retry_base_delay", 1.0) returns None (not 1.0) because the key exists. Passing None to EvalConfig(retry_base_delay=None) fails Pydantic validation since the field expects float, not None. This crashes the CLI whenever these flags are omitted, which is the normal usage path.

Additional Locations (1)

verifiers/scripts/eval.py#L572-L575

peter-luminova added 3 commits February 19, 2026 11:08

cursor bot reviewed Feb 19, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

feat(retry): connect retry configuration to maybe_retry calls#936

feat(retry): connect retry configuration to maybe_retry calls#936
gspeter-max wants to merge 3 commits intoPrimeIntellect-ai:mainfrom
gspeter-max:verifiersContirbuting/issue_#577

gspeter-max commented Feb 19, 2026 •

edited by cursor bot

Loading

Uh oh!

CLAassistant commented Feb 19, 2026

Uh oh!

cursor bot left a comment

Uh oh!

cursor bot Feb 19, 2026

Uh oh!

cursor bot Feb 19, 2026

Uh oh!

cursor bot Feb 19, 2026

Uh oh!

cursor bot Feb 19, 2026

Uh oh!

cursor bot Feb 19, 2026

Uh oh!

cursor bot Feb 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

Conversation

gspeter-max commented Feb 19, 2026 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Solution

Changes

Test Results

Backward Compatibility

Example Usage

References

Uh oh!

CLAassistant commented Feb 19, 2026

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor bot Feb 19, 2026

Choose a reason for hiding this comment

TOML config rejects new retry fields as invalid

Uh oh!

cursor bot Feb 19, 2026

Choose a reason for hiding this comment

Anthropic rate limit check is unreachable dead code

Uh oh!

cursor bot Feb 19, 2026

Choose a reason for hiding this comment

Non-jitter backoff ignores base delay as scaling factor

Uh oh!

cursor bot Feb 19, 2026

Choose a reason for hiding this comment

Missing documentation updates for new retry CLI flags

Uh oh!

cursor bot Feb 19, 2026

Choose a reason for hiding this comment

Missing skills update for new eval CLI flags

Uh oh!

cursor bot Feb 19, 2026

Choose a reason for hiding this comment

CLI retry defaults are None, causing Pydantic validation error

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

gspeter-max commented Feb 19, 2026 •

edited by cursor bot

Loading