Skip to content

Comments

feat(retry): connect retry configuration to maybe_retry calls#936

Open
gspeter-max wants to merge 3 commits intoPrimeIntellect-ai:mainfrom
gspeter-max:verifiersContirbuting/issue_#577
Open

feat(retry): connect retry configuration to maybe_retry calls#936
gspeter-max wants to merge 3 commits intoPrimeIntellect-ai:mainfrom
gspeter-max:verifiersContirbuting/issue_#577

Conversation

@gspeter-max
Copy link

@gspeter-max gspeter-max commented Feb 19, 2026

Summary

Connects retry configuration timing parameters (retry_base_delay, retry_max_backoff, retry_jitter) to the actual maybe_retry calls throughout the codebase.

Problem

Part 1 (already merged) added retry configuration fields to ClientConfig and updated maybe_retry to support configurable timing parameters. However, these fields were NOT connected to the actual maybe_retry calls, so hardcoded defaults were always being used instead of the configured values.

Solution

This PR completes the implementation by:

  1. EvalConfig - Added retry timing fields with defaults matching maybe_retry:

    • retry_base_delay: float = 1.0 (initial delay)
    • retry_max_backoff: float = 60.0 (maximum wait time)
    • retry_jitter: bool = True (jitter control)
  2. CLI Arguments - Added three new CLI flags:

    • --retry-base-delay
    • --retry-max-backoff
    • --retry-jitter
  3. Complete Data Flow - Connected retry parameters through:

    • EvalConfigrun_evaluation()evaluate()generate()run_rollout()/run_group()maybe_retry()
    • Server mode: EnvClient → Request types → EnvServer handlers

Changes

  • verifiers/types.py - Added retry timing fields to EvalConfig
  • verifiers/scripts/eval.py - Added CLI arguments and updated build_eval_config()
  • verifiers/envs/environment.py - Updated run_rollout, run_group, generate, evaluate methods
  • verifiers/envs/env_group.py - Updated wrapper methods to pass retry parameters
  • verifiers/workers/client/env_client.py - Updated client interface
  • verifiers/workers/types.py - Added retry fields to request types
  • verifiers/workers/server/env_server.py - Updated server handlers
  • verifiers/utils/eval_utils.py - Updated evaluation orchestration
  • tests/test_rate_limit_retry.py - Added integration tests

Test Results

All tests passing:

  • tests/test_rate_limit_retry.py - 7/7 passed
  • tests/test_environment.py - 30/30 passed
  • tests/test_env_group.py - 16/16 passed

Backward Compatibility

✅ Fully backward compatible - all new parameters have defaults matching previous hardcoded behavior

Example Usage

# CLI usage
prime eval gsm8k \
  --max-retries 3 \
  --retry-base-delay 2.0 \
  --retry-max-backoff 30.0 \
  --retry-jitter false

# TOML config
retry_base_delay = 2.0
retry_max_backoff = 30.0
retry_jitter = false

References

Closes #577


Note

Medium Risk
Touches core evaluation execution paths and server RPC request schemas, so mis-threaded defaults or mismatched client/server versions could change retry behavior under load; changes are otherwise additive with sensible defaults.

Overview
Adds configurable retry backoff plumbing end-to-end. Environment.run_rollout/run_group now pass retry_base_delay, retry_max_backoff, and retry_jitter into maybe_retry, and these parameters are threaded through EnvGroup, generate/evaluate, and the worker client/server request types so server mode honors the same settings.

Expands what gets retried and how rate limits are surfaced. Introduces verifiers.errors.RateLimitError, maps OpenAI/Anthropic/judge-model 429s to it, and updates maybe_retry to retry RateLimitError by default with optional jitter disabling. CLI support is added via --retry-base-delay/--retry-max-backoff/--retry-jitter, and new tests cover rate-limit retry behavior and config defaults.

Written by Cursor Bugbot for commit 4177c39. This will update automatically on new commits. Configure here.

…f for rate limits

Root Cause:
- HTTP 429 rate limit errors from API models caused failures instead of automatic retries
- The maybe_retry function only retried InfraError and InvalidModelResponseError
- No configurable retry parameters (base delay, max backoff, jitter) existed for rate limit handling

Changes:
1. Added RateLimitError class to verifiers/errors.py
   - New error type for unified rate limit handling across providers (OpenAI, Anthropic)

2. Added retry configuration to ClientConfig and EndpointClientConfig in verifiers/types.py
   - retry_base_delay: float = 1.0 (Initial delay in seconds)
   - retry_max_backoff: float = 60.0 (Maximum backoff in seconds)
   - retry_jitter: bool = True (Add random jitter to avoid thundering herd)

3. Modified maybe_retry function in verifiers/utils/async_utils.py
   - Added jitter parameter to function signature
   - Added RateLimitError to default error_types tuple
   - Updated docstring to document new parameters
   - Made jitter configurable with conditional tenacity wait strategy

4. Updated client implementations to catch and convert rate limit errors
   - openai_chat_completions_client.py: Catches OpenAI RateLimitError, converts to vf.RateLimitError
   - anthropic_messages_client.py: Checks for HTTP 429 status code, converts to vf.RateLimitError
   - openai_completions_client.py: Inherits rate limit handling from shared decorator

5. Updated judge_rubric.py
   - Replaced RuntimeError with VFRateLimitError for judge model rate limits
   - Allows callers to benefit from retry mechanism

6. Created comprehensive tests in tests/test_rate_limit_retry.py
   - Tests rate limit error retry behavior
   - Tests retry exhaustion returns error in state
   - Tests max_retries=0 disables retry
   - Tests jitter configuration
   - Tests multiple error types in retry

Impact:
- Rate limit errors (HTTP 429) now trigger automatic retries with exponential backoff
- Configurable retry parameters provide granular control over retry behavior
- Fully backward compatible - defaults match existing behavior
- Jitter helps prevent thundering herd problem when multiple requests hit rate limits simultaneously

Refs: PrimeIntellect-ai#577
…try behavior

Root Cause:
- Test was returning error objects directly instead of wrapped in dicts
- maybe_retry expects errors in state["error"] dict format

Changes:
- Fixed test_multiple_error_types_in_retry to return proper dict format
- Changed from returning bare error objects to {"error": error} format

Impact:
- Test now correctly matches how maybe_retry works with state dictionaries
- All 5 tests in test_rate_limit_retry.py pass

Refs: PrimeIntellect-ai#577
Root Cause:
- Part 1 added retry config fields (retry_base_delay, retry_max_backoff, retry_jitter) to ClientConfig
- Part 1 updated maybe_retry to support configurable timing parameters
- These fields were NOT connected to the actual maybe_retry calls in run_rollout and run_group
- Hardcoded defaults were being used instead of the configured values

Changes:
1. EvalConfig - Added retry timing fields (retry_base_delay, retry_max_backoff, retry_jitter)
   - Defaults match maybe_retry: 1.0, 60.0, True
2. CLI - Added --retry-base-delay, --retry-max-backoff, --retry-jitter arguments
3. build_eval_config - Updated to construct EvalConfig with retry timing fields
4. Environment.run_rollout - Added retry timing parameters and passed to maybe_retry
5. Environment.run_group - Added retry timing parameters and passed to maybe_retry
6. Environment.generate - Added retry timing parameters and passed to run_rollout/run_group
7. Environment.evaluate - Added retry timing parameters and passed to generate
8. EnvClient.run_rollout - Added retry timing parameters and passed to request
9. EnvClient.run_group - Added retry timing parameters and passed to request
10. RunRolloutRequest - Added retry timing fields
11. RunGroupRequest - Added retry timing fields
12. EnvServer.handle_run_rollout - Passed retry timing to env.run_rollout
13. EnvServer.handle_run_group - Passed retry timing to env.run_group
14. eval_utils.run_evaluation - Passed config retry timing to vf_env.evaluate
15. Tests - Added tests for EvalConfig retry timing parameters

Impact:
- Users can now configure retry timing via CLI or TOML config
- Custom retry base delay, max backoff, and jitter settings are properly propagated
- Backward compatible - defaults match previous hardcoded behavior

Refs: PrimeIntellect-ai#577

Files Modified:
- verifiers/types.py
- verifiers/scripts/eval.py
- verifiers/envs/environment.py
- verifiers/workers/client/env_client.py
- verifiers/workers/types.py
- verifiers/workers/server/env_server.py
- verifiers/utils/eval_utils.py
- tests/test_rate_limit_retry.py
@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 6 potential issues.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

max_retries=raw.get("max_retries", 0),
retry_base_delay=raw.get("retry_base_delay", 1.0),
retry_max_backoff=raw.get("retry_max_backoff", 60.0),
retry_jitter=raw.get("retry_jitter", True),
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TOML config rejects new retry fields as invalid

High Severity

build_eval_config reads retry_base_delay, retry_max_backoff, and retry_jitter from the raw dict via raw.get(...), and the PR description shows TOML config as a supported use case. However, load_toml_config in verifiers/utils/eval_utils.py has a valid_fields allowlist that does NOT include these three fields. Any TOML config specifying them will raise a ValueError at validation time, making the TOML config path for retry parameters completely broken.

Additional Locations (1)

Fix in Cursor Fix in Web

except BadRequestError as e:
# Check for HTTP 429 rate limit
if hasattr(e, "response") and hasattr(e.response, "status_code") and e.response.status_code == 429:
raise VFRateLimitError(f"Anthropic rate limit: {e}") from e
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anthropic rate limit check is unreachable dead code

High Severity

The Anthropic SDK raises anthropic.RateLimitError (not BadRequestError) for HTTP 429 responses. The new check for e.response.status_code == 429 inside except BadRequestError is unreachable — a BadRequestError will never have a 429 status code. Anthropic rate limit errors are not imported or caught, so they fall through to the generic except Exception in client.py and become a ModelError, bypassing the retry mechanism entirely. The OpenAI client correctly imports and catches a separate RateLimitError exception.

Fix in Cursor Fix in Web

wait=(
tc.wait_exponential_jitter(initial=initial, max=max_wait)
if jitter
else tc.wait_exponential(multiplier=1, min=initial, max=max_wait)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-jitter backoff ignores base delay as scaling factor

Medium Severity

When jitter=False, wait_exponential(multiplier=1, min=initial, max=max_wait) uses initial as a floor, not as the exponential base. With retry_base_delay=5.0, the sequence would be 5, 5, 5, 8, 16… instead of the expected 5, 10, 20, 40… The multiplier parameter needs to be initial (not 1) to match the scaling behavior of wait_exponential_jitter(initial=initial).

Fix in Cursor Fix in Web

type=lambda x: x.lower() in ("true", "1", "yes") if x is not None else None,
default=None,
help="Enable jitter to avoid thundering herd (default: True)",
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing documentation updates for new retry CLI flags

Medium Severity

This PR adds three new CLI flags (--retry-base-delay, --retry-max-backoff, --retry-jitter) and three new EvalConfig fields, but neither docs/evaluation.md (which has a CLI flags table including --max-retries) nor docs/reference.md (which lists EvalConfig fields) are updated to reflect these additions.

Fix in Cursor Fix in Web

Triggered by project rule: BugBot Instructions

type=lambda x: x.lower() in ("true", "1", "yes") if x is not None else None,
default=None,
help="Enable jitter to avoid thundering herd (default: True)",
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing skills update for new eval CLI flags

Low Severity

This PR modifies verifiers/scripts/eval.py with new CLI arguments affecting evaluation workflow behavior. The skills/evaluate-environments/SKILL.md file, which documents evaluation patterns and CLI usage, was not updated to reflect the new retry configuration options.

Fix in Cursor Fix in Web

Triggered by project rule: BugBot Instructions

type=lambda x: x.lower() in ("true", "1", "yes") if x is not None else None,
default=None,
help="Enable jitter to avoid thundering herd (default: True)",
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CLI retry defaults are None, causing Pydantic validation error

High Severity

The three new CLI arguments (--retry-base-delay, --retry-max-backoff, --retry-jitter) use default=None. When unspecified, vars(args) includes these keys with value None. Then raw.get("retry_base_delay", 1.0) returns None (not 1.0) because the key exists. Passing None to EvalConfig(retry_base_delay=None) fails Pydantic validation since the field expects float, not None. This crashes the CLI whenever these flags are omitted, which is the normal usage path.

Additional Locations (1)

Fix in Cursor Fix in Web

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Allow configurable retry mechanism with exponential backoff

3 participants