feat(retry): connect retry configuration to maybe_retry calls#936
feat(retry): connect retry configuration to maybe_retry calls#936gspeter-max wants to merge 3 commits intoPrimeIntellect-ai:mainfrom
Conversation
…f for rate limits Root Cause: - HTTP 429 rate limit errors from API models caused failures instead of automatic retries - The maybe_retry function only retried InfraError and InvalidModelResponseError - No configurable retry parameters (base delay, max backoff, jitter) existed for rate limit handling Changes: 1. Added RateLimitError class to verifiers/errors.py - New error type for unified rate limit handling across providers (OpenAI, Anthropic) 2. Added retry configuration to ClientConfig and EndpointClientConfig in verifiers/types.py - retry_base_delay: float = 1.0 (Initial delay in seconds) - retry_max_backoff: float = 60.0 (Maximum backoff in seconds) - retry_jitter: bool = True (Add random jitter to avoid thundering herd) 3. Modified maybe_retry function in verifiers/utils/async_utils.py - Added jitter parameter to function signature - Added RateLimitError to default error_types tuple - Updated docstring to document new parameters - Made jitter configurable with conditional tenacity wait strategy 4. Updated client implementations to catch and convert rate limit errors - openai_chat_completions_client.py: Catches OpenAI RateLimitError, converts to vf.RateLimitError - anthropic_messages_client.py: Checks for HTTP 429 status code, converts to vf.RateLimitError - openai_completions_client.py: Inherits rate limit handling from shared decorator 5. Updated judge_rubric.py - Replaced RuntimeError with VFRateLimitError for judge model rate limits - Allows callers to benefit from retry mechanism 6. Created comprehensive tests in tests/test_rate_limit_retry.py - Tests rate limit error retry behavior - Tests retry exhaustion returns error in state - Tests max_retries=0 disables retry - Tests jitter configuration - Tests multiple error types in retry Impact: - Rate limit errors (HTTP 429) now trigger automatic retries with exponential backoff - Configurable retry parameters provide granular control over retry behavior - Fully backward compatible - defaults match existing behavior - Jitter helps prevent thundering herd problem when multiple requests hit rate limits simultaneously Refs: PrimeIntellect-ai#577
…try behavior
Root Cause:
- Test was returning error objects directly instead of wrapped in dicts
- maybe_retry expects errors in state["error"] dict format
Changes:
- Fixed test_multiple_error_types_in_retry to return proper dict format
- Changed from returning bare error objects to {"error": error} format
Impact:
- Test now correctly matches how maybe_retry works with state dictionaries
- All 5 tests in test_rate_limit_retry.py pass
Refs: PrimeIntellect-ai#577
Root Cause: - Part 1 added retry config fields (retry_base_delay, retry_max_backoff, retry_jitter) to ClientConfig - Part 1 updated maybe_retry to support configurable timing parameters - These fields were NOT connected to the actual maybe_retry calls in run_rollout and run_group - Hardcoded defaults were being used instead of the configured values Changes: 1. EvalConfig - Added retry timing fields (retry_base_delay, retry_max_backoff, retry_jitter) - Defaults match maybe_retry: 1.0, 60.0, True 2. CLI - Added --retry-base-delay, --retry-max-backoff, --retry-jitter arguments 3. build_eval_config - Updated to construct EvalConfig with retry timing fields 4. Environment.run_rollout - Added retry timing parameters and passed to maybe_retry 5. Environment.run_group - Added retry timing parameters and passed to maybe_retry 6. Environment.generate - Added retry timing parameters and passed to run_rollout/run_group 7. Environment.evaluate - Added retry timing parameters and passed to generate 8. EnvClient.run_rollout - Added retry timing parameters and passed to request 9. EnvClient.run_group - Added retry timing parameters and passed to request 10. RunRolloutRequest - Added retry timing fields 11. RunGroupRequest - Added retry timing fields 12. EnvServer.handle_run_rollout - Passed retry timing to env.run_rollout 13. EnvServer.handle_run_group - Passed retry timing to env.run_group 14. eval_utils.run_evaluation - Passed config retry timing to vf_env.evaluate 15. Tests - Added tests for EvalConfig retry timing parameters Impact: - Users can now configure retry timing via CLI or TOML config - Custom retry base delay, max backoff, and jitter settings are properly propagated - Backward compatible - defaults match previous hardcoded behavior Refs: PrimeIntellect-ai#577 Files Modified: - verifiers/types.py - verifiers/scripts/eval.py - verifiers/envs/environment.py - verifiers/workers/client/env_client.py - verifiers/workers/types.py - verifiers/workers/server/env_server.py - verifiers/utils/eval_utils.py - tests/test_rate_limit_retry.py
|
|
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 6 potential issues.
Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.
| max_retries=raw.get("max_retries", 0), | ||
| retry_base_delay=raw.get("retry_base_delay", 1.0), | ||
| retry_max_backoff=raw.get("retry_max_backoff", 60.0), | ||
| retry_jitter=raw.get("retry_jitter", True), |
There was a problem hiding this comment.
TOML config rejects new retry fields as invalid
High Severity
build_eval_config reads retry_base_delay, retry_max_backoff, and retry_jitter from the raw dict via raw.get(...), and the PR description shows TOML config as a supported use case. However, load_toml_config in verifiers/utils/eval_utils.py has a valid_fields allowlist that does NOT include these three fields. Any TOML config specifying them will raise a ValueError at validation time, making the TOML config path for retry parameters completely broken.
Additional Locations (1)
| except BadRequestError as e: | ||
| # Check for HTTP 429 rate limit | ||
| if hasattr(e, "response") and hasattr(e.response, "status_code") and e.response.status_code == 429: | ||
| raise VFRateLimitError(f"Anthropic rate limit: {e}") from e |
There was a problem hiding this comment.
Anthropic rate limit check is unreachable dead code
High Severity
The Anthropic SDK raises anthropic.RateLimitError (not BadRequestError) for HTTP 429 responses. The new check for e.response.status_code == 429 inside except BadRequestError is unreachable — a BadRequestError will never have a 429 status code. Anthropic rate limit errors are not imported or caught, so they fall through to the generic except Exception in client.py and become a ModelError, bypassing the retry mechanism entirely. The OpenAI client correctly imports and catches a separate RateLimitError exception.
| wait=( | ||
| tc.wait_exponential_jitter(initial=initial, max=max_wait) | ||
| if jitter | ||
| else tc.wait_exponential(multiplier=1, min=initial, max=max_wait) |
There was a problem hiding this comment.
Non-jitter backoff ignores base delay as scaling factor
Medium Severity
When jitter=False, wait_exponential(multiplier=1, min=initial, max=max_wait) uses initial as a floor, not as the exponential base. With retry_base_delay=5.0, the sequence would be 5, 5, 5, 8, 16… instead of the expected 5, 10, 20, 40… The multiplier parameter needs to be initial (not 1) to match the scaling behavior of wait_exponential_jitter(initial=initial).
| type=lambda x: x.lower() in ("true", "1", "yes") if x is not None else None, | ||
| default=None, | ||
| help="Enable jitter to avoid thundering herd (default: True)", | ||
| ) |
There was a problem hiding this comment.
Missing documentation updates for new retry CLI flags
Medium Severity
This PR adds three new CLI flags (--retry-base-delay, --retry-max-backoff, --retry-jitter) and three new EvalConfig fields, but neither docs/evaluation.md (which has a CLI flags table including --max-retries) nor docs/reference.md (which lists EvalConfig fields) are updated to reflect these additions.
Triggered by project rule: BugBot Instructions
| type=lambda x: x.lower() in ("true", "1", "yes") if x is not None else None, | ||
| default=None, | ||
| help="Enable jitter to avoid thundering herd (default: True)", | ||
| ) |
There was a problem hiding this comment.
Missing skills update for new eval CLI flags
Low Severity
This PR modifies verifiers/scripts/eval.py with new CLI arguments affecting evaluation workflow behavior. The skills/evaluate-environments/SKILL.md file, which documents evaluation patterns and CLI usage, was not updated to reflect the new retry configuration options.
Triggered by project rule: BugBot Instructions
| type=lambda x: x.lower() in ("true", "1", "yes") if x is not None else None, | ||
| default=None, | ||
| help="Enable jitter to avoid thundering herd (default: True)", | ||
| ) |
There was a problem hiding this comment.
CLI retry defaults are None, causing Pydantic validation error
High Severity
The three new CLI arguments (--retry-base-delay, --retry-max-backoff, --retry-jitter) use default=None. When unspecified, vars(args) includes these keys with value None. Then raw.get("retry_base_delay", 1.0) returns None (not 1.0) because the key exists. Passing None to EvalConfig(retry_base_delay=None) fails Pydantic validation since the field expects float, not None. This crashes the CLI whenever these flags are omitted, which is the normal usage path.


Summary
Connects retry configuration timing parameters (
retry_base_delay,retry_max_backoff,retry_jitter) to the actualmaybe_retrycalls throughout the codebase.Problem
Part 1 (already merged) added retry configuration fields to
ClientConfigand updatedmaybe_retryto support configurable timing parameters. However, these fields were NOT connected to the actualmaybe_retrycalls, so hardcoded defaults were always being used instead of the configured values.Solution
This PR completes the implementation by:
EvalConfig - Added retry timing fields with defaults matching
maybe_retry:retry_base_delay: float = 1.0(initial delay)retry_max_backoff: float = 60.0(maximum wait time)retry_jitter: bool = True(jitter control)CLI Arguments - Added three new CLI flags:
--retry-base-delay--retry-max-backoff--retry-jitterComplete Data Flow - Connected retry parameters through:
EvalConfig→run_evaluation()→evaluate()→generate()→run_rollout()/run_group()→maybe_retry()EnvClient→ Request types →EnvServerhandlersChanges
verifiers/types.py- Added retry timing fields toEvalConfigverifiers/scripts/eval.py- Added CLI arguments and updatedbuild_eval_config()verifiers/envs/environment.py- Updatedrun_rollout,run_group,generate,evaluatemethodsverifiers/envs/env_group.py- Updated wrapper methods to pass retry parametersverifiers/workers/client/env_client.py- Updated client interfaceverifiers/workers/types.py- Added retry fields to request typesverifiers/workers/server/env_server.py- Updated server handlersverifiers/utils/eval_utils.py- Updated evaluation orchestrationtests/test_rate_limit_retry.py- Added integration testsTest Results
All tests passing:
tests/test_rate_limit_retry.py- 7/7 passedtests/test_environment.py- 30/30 passedtests/test_env_group.py- 16/16 passedBackward Compatibility
✅ Fully backward compatible - all new parameters have defaults matching previous hardcoded behavior
Example Usage
References
Closes #577
Note
Medium Risk
Touches core evaluation execution paths and server RPC request schemas, so mis-threaded defaults or mismatched client/server versions could change retry behavior under load; changes are otherwise additive with sensible defaults.
Overview
Adds configurable retry backoff plumbing end-to-end.
Environment.run_rollout/run_groupnow passretry_base_delay,retry_max_backoff, andretry_jitterintomaybe_retry, and these parameters are threaded throughEnvGroup,generate/evaluate, and the worker client/server request types so server mode honors the same settings.Expands what gets retried and how rate limits are surfaced. Introduces
verifiers.errors.RateLimitError, maps OpenAI/Anthropic/judge-model 429s to it, and updatesmaybe_retryto retryRateLimitErrorby default with optional jitter disabling. CLI support is added via--retry-base-delay/--retry-max-backoff/--retry-jitter, and new tests cover rate-limit retry behavior and config defaults.Written by Cursor Bugbot for commit 4177c39. This will update automatically on new commits. Configure here.