add RolloutGatewayMixin for server-side rollout execution#954
add RolloutGatewayMixin for server-side rollout execution#954
RolloutGatewayMixin for server-side rollout execution#954Conversation
… prevent rubric KeyError on early rollout failures
|
Hm what's the use case here? Would prefer we don't hardcode in vLLM-specific assumptions at the env pattern level, can we make this more endpoint-agnostic? |
|
if it truly needs to be baked in to the class, would be better if it's something like a mixin and then toggled on/off via an env-level class attribute, so that eval model is still supported with other endpoints. |
mikasenghaas
left a comment
There was a problem hiding this comment.
nice, this looks pretty clean!
| run_command: str, | ||
| gateway_port: int = 8000, | ||
| max_turns: int = -1, | ||
| timeout_seconds: float = 3600.0, | ||
| poll_interval: float = 2.0, | ||
| docker_image: str = "python:3.11-slim", | ||
| start_command: str = "tail -f /dev/null", | ||
| cpu_cores: int = 1, | ||
| memory_gb: int = 2, | ||
| disk_size_gb: int = 5, | ||
| gpu_count: int = 0, | ||
| timeout_minutes: int = 60, | ||
| environment_vars: dict[str, str] | None = None, | ||
| team_id: str | None = None, | ||
| advanced_configs: AdvancedConfigs | None = None, | ||
| labels: list[str] | None = None, | ||
| max_retries: int = 5, | ||
| base_delay: float = 0.5, | ||
| backoff_factor: float = 2.0, | ||
| max_backoff_seconds: float = 30.0, | ||
| jitter: float = 1e-3, | ||
| sandbox_client_max_workers: int = 10, | ||
| sandbox_client_max_connections: int = 100, | ||
| sandbox_client_max_keepalive_connections: int = 50, | ||
| sandbox_wait_for_creation_max_attempts: int = 120, |
There was a problem hiding this comment.
maybe nicer to have sandbox_mixin_kwargs so that we dont have to maintain constructor signatures in both places?
| # `state["client"]` may be a Verifiers wrapper with the raw client on `.client`. | ||
| client = getattr(state["client"], "client", state["client"]) |
There was a problem hiding this comment.
isn't it always a vf client?
| gateway_url = str(client.base_url).rstrip("/") | ||
| if gateway_url.endswith("/v1"): | ||
| gateway_url = gateway_url[:-3] | ||
| return gateway_url |
There was a problem hiding this comment.
at this point we shoul make this a util haha, i feel like ive written that exact code like 3x
| async def rollout( | ||
| self, | ||
| input: RolloutInput, | ||
| client: Client | ClientConfig, |
There was a problem hiding this comment.
isn't the signature just client: Client?
RolloutGatewayEnv for server-side rollout executionRolloutGatewayMixin for server-side rollout execution
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 4 potential issues.
Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.
| f"stage=agent_completed exit_code={status.exit_code}" | ||
| ) | ||
| return | ||
| await asyncio.sleep(self.poll_interval) |
There was a problem hiding this comment.
Missing poll_interval attribute in RolloutGatewayMixin
High Severity
The poll_job_completion method uses self.poll_interval but RolloutGatewayMixin never initializes this attribute. The mixin assumes poll_interval exists from the parent class CliAgentEnv, but relies on duck typing without guaranteeing the attribute is present, causing AttributeError if used with a parent class that doesn't define it.
| f"agent_exit_code={state.get('agent_exit_code')} error={error_name}" | ||
| ) | ||
|
|
||
| return state |
There was a problem hiding this comment.
post_rollout never called in gateway rollout path
Medium Severity
The gateway rollout method bypasses the normal _cleanup handlers and never calls post_rollout, breaking subclasses like the test's GatewayCliAgentEnv that override post_rollout to compute rewards. The cleanup at line 373 calls self._cleanup(state) which invokes decorated @vf.cleanup handlers, but CliAgentEnv.destroy_sandbox (which calls post_rollout at line 473) is a cleanup handler that expects to run in the normal rollout flow, not in this custom gateway path.
| "model": state["model"], | ||
| "sampling_params": sampling_params, | ||
| "max_turns": self.max_turns, | ||
| "max_seq_len": self.max_seq_len, |
There was a problem hiding this comment.
Attributes assumed from parent without declaration
Low Severity
The mixin directly accesses multiple attributes from parent classes like self.environment_vars, self.max_turns, self.max_seq_len, self.sandbox_client, self.run_command, self.start_command, etc. without declaring them or documenting the contract. This creates tight coupling and makes the mixin fragile if parent classes change, contradicting the PR discussion's concern about avoiding hardcoded assumptions.
|
|
||
| rollout_registered = False | ||
| try: | ||
| await self.register_rollout(state) |
There was a problem hiding this comment.
Missing validation causes cryptic crash if init_gateway not called
Medium Severity
When use_gateway=True but init_gateway() was never called, the rollout method crashes with a cryptic AttributeError for _gw_http_client when trying to register the rollout. No validation checks whether gateway resources were initialized before attempting to use them, making misconfiguration difficult to debug.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.
| state["completion"] = data.get("completion") | ||
| state["is_truncated"] = bool( | ||
| data.get("is_truncated", state.get("is_truncated", False)) | ||
| ) |
There was a problem hiding this comment.
Gateway trajectory fetch can erase prompt
Medium Severity
fetch_trajectory assigns state["prompt"] and state["completion"] directly from the gateway response, even when those fields are absent, which can replace valid initialized values with None. Downstream code (rendering, scoring, logging, or consumers expecting Messages) can then fail or behave inconsistently.


Description
Adds RolloutGatewayMixin, a new experimental environment that offloads agent execution to sandboxes that talk directly to the vLLM rollout gateway.
The env handles sandbox lifecycle, rollout registration/unregistration, trajectory fetch, and reward — the sandboxed agent drives its own inference loop through a prime tunnel URL.
depends on PrimeIntellect-ai/prime-rl#1757
example env PrimeIntellect-ai/research-environments#162
Type of Change
Testing
uv run pytestlocally.Checklist
Additional Notes
Note
Medium Risk
Touches rollout execution/cleanup paths and adds new network/tunnel-driven control flow; failures could leak resources or change rollout outputs for envs opting into the mixin.
Overview
Adds experimental
RolloutGatewayMixinto runCliAgentEnvrollouts via a server-side vLLM rollout gateway: the env registers/unregisters rollouts over HTTP, opens/maintains Prime tunnels per gateway host, runs the agent in a sandbox pointingOPENAI_BASE_URLat the tunneled/v1/rollouts/{id}endpoint, then fetches the final trajectory.Refactors
CliAgentEnvto initialize interception/tunnel resources viainit_interception()and hardens cleanup/teardown to be safe when those resources weren’t created. State initialization now setsstate["completion"] = Noneup front to support gateway-populated completions, and new integration tests validate gateway registration payloads, trajectory fetching, tunnel reuse, and teardown behavior.Written by Cursor Bugbot for commit 7a36c41. This will update automatically on new commits. Configure here.