add `RolloutGatewayMixin` for server-side rollout execution by rasdani · Pull Request #954 · PrimeIntellect-ai/verifiers

rasdani · 2026-02-24T01:27:12Z

Description

Adds RolloutGatewayMixin, a new experimental environment that offloads agent execution to sandboxes that talk directly to the vLLM rollout gateway.

The env handles sandbox lifecycle, rollout registration/unregistration, trajectory fetch, and reward — the sandboxed agent drives its own inference loop through a prime tunnel URL.

depends on PrimeIntellect-ai/prime-rl#1757
example env PrimeIntellect-ai/research-environments#162

Type of Change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Documentation update
Test improvement

Testing

All existing tests pass when running uv run pytest locally.
New tests have been added to cover the changes

Checklist

My code follows the style guidelines of this project as outlined in AGENTS.md
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
Any dependent changes have been merged and published

Additional Notes

Note

Medium Risk
Touches rollout execution/cleanup paths and adds new network/tunnel-driven control flow; failures could leak resources or change rollout outputs for envs opting into the mixin.

Overview
Adds experimental RolloutGatewayMixin to run CliAgentEnv rollouts via a server-side vLLM rollout gateway: the env registers/unregisters rollouts over HTTP, opens/maintains Prime tunnels per gateway host, runs the agent in a sandbox pointing OPENAI_BASE_URL at the tunneled /v1/rollouts/{id} endpoint, then fetches the final trajectory.

Refactors CliAgentEnv to initialize interception/tunnel resources via init_interception() and hardens cleanup/teardown to be safe when those resources weren’t created. State initialization now sets state["completion"] = None up front to support gateway-populated completions, and new integration tests validate gateway registration payloads, trajectory fetching, tunnel reuse, and teardown behavior.

^{Written by Cursor Bugbot for commit 7a36c41. This will update automatically on new commits. Configure here.}

… prevent rubric KeyError on early rollout failures

verifiers/envs/environment.py

verifiers/envs/experimental/rollout_gateway_mixin.py

willccbb · 2026-02-24T04:30:08Z

Hm what's the use case here? Would prefer we don't hardcode in vLLM-specific assumptions at the env pattern level, can we make this more endpoint-agnostic?

willccbb · 2026-02-24T04:31:30Z

if it truly needs to be baked in to the class, would be better if it's something like a mixin and then toggled on/off via an env-level class attribute, so that eval model is still supported with other endpoints.

mikasenghaas

nice, this looks pretty clean!

mikasenghaas · 2026-02-24T10:24:23Z

verifiers/envs/experimental/rollout_gateway_env.py

+        run_command: str,
+        gateway_port: int = 8000,
+        max_turns: int = -1,
+        timeout_seconds: float = 3600.0,
+        poll_interval: float = 2.0,
+        docker_image: str = "python:3.11-slim",
+        start_command: str = "tail -f /dev/null",
+        cpu_cores: int = 1,
+        memory_gb: int = 2,
+        disk_size_gb: int = 5,
+        gpu_count: int = 0,
+        timeout_minutes: int = 60,
+        environment_vars: dict[str, str] | None = None,
+        team_id: str | None = None,
+        advanced_configs: AdvancedConfigs | None = None,
+        labels: list[str] | None = None,
+        max_retries: int = 5,
+        base_delay: float = 0.5,
+        backoff_factor: float = 2.0,
+        max_backoff_seconds: float = 30.0,
+        jitter: float = 1e-3,
+        sandbox_client_max_workers: int = 10,
+        sandbox_client_max_connections: int = 100,
+        sandbox_client_max_keepalive_connections: int = 50,
+        sandbox_wait_for_creation_max_attempts: int = 120,


maybe nicer to have sandbox_mixin_kwargs so that we dont have to maintain constructor signatures in both places?

mikasenghaas · 2026-02-24T10:25:55Z

verifiers/envs/experimental/rollout_gateway_env.py

+        # `state["client"]` may be a Verifiers wrapper with the raw client on `.client`.
+        client = getattr(state["client"], "client", state["client"])


isn't it always a vf client?

mikasenghaas · 2026-02-24T10:26:41Z

verifiers/envs/experimental/rollout_gateway_mixin.py

+        gateway_url = str(client.base_url).rstrip("/")
+        if gateway_url.endswith("/v1"):
+            gateway_url = gateway_url[:-3]
+        return gateway_url


at this point we shoul make this a util haha, i feel like ive written that exact code like 3x

mikasenghaas · 2026-02-24T10:27:42Z

verifiers/envs/experimental/rollout_gateway_mixin.py

+    async def rollout(
+        self,
+        input: RolloutInput,
+        client: Client | ClientConfig,


isn't the signature just client: Client?

cursor

Cursor Bugbot has reviewed your changes and found 4 potential issues.

^{Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.}

cursor · 2026-02-24T20:54:32Z

verifiers/envs/experimental/rollout_gateway_mixin.py

+                        f"stage=agent_completed exit_code={status.exit_code}"
+                    )
+                return
+            await asyncio.sleep(self.poll_interval)


Missing poll_interval attribute in RolloutGatewayMixin

High Severity

The poll_job_completion method uses self.poll_interval but RolloutGatewayMixin never initializes this attribute. The mixin assumes poll_interval exists from the parent class CliAgentEnv, but relies on duck typing without guaranteeing the attribute is present, causing AttributeError if used with a parent class that doesn't define it.

cursor · 2026-02-24T20:54:33Z

verifiers/envs/experimental/rollout_gateway_mixin.py

+                f"agent_exit_code={state.get('agent_exit_code')} error={error_name}"
+            )
+
+        return state


post_rollout never called in gateway rollout path

Medium Severity

The gateway rollout method bypasses the normal _cleanup handlers and never calls post_rollout, breaking subclasses like the test's GatewayCliAgentEnv that override post_rollout to compute rewards. The cleanup at line 373 calls self._cleanup(state) which invokes decorated @vf.cleanup handlers, but CliAgentEnv.destroy_sandbox (which calls post_rollout at line 473) is a cleanup handler that expects to run in the normal rollout flow, not in this custom gateway path.

cursor · 2026-02-24T20:54:33Z

verifiers/envs/experimental/rollout_gateway_mixin.py

+            "model": state["model"],
+            "sampling_params": sampling_params,
+            "max_turns": self.max_turns,
+            "max_seq_len": self.max_seq_len,


Attributes assumed from parent without declaration

Low Severity

The mixin directly accesses multiple attributes from parent classes like self.environment_vars, self.max_turns, self.max_seq_len, self.sandbox_client, self.run_command, self.start_command, etc. without declaring them or documenting the contract. This creates tight coupling and makes the mixin fragile if parent classes change, contradicting the PR discussion's concern about avoiding hardcoded assumptions.

cursor · 2026-02-24T20:54:33Z

verifiers/envs/experimental/rollout_gateway_mixin.py

+
+        rollout_registered = False
+        try:
+            await self.register_rollout(state)


Missing validation causes cryptic crash if init_gateway not called

Medium Severity

When use_gateway=True but init_gateway() was never called, the rollout method crashes with a cryptic AttributeError for _gw_http_client when trying to register the rollout. No validation checks whether gateway resources were initialized before attempting to use them, making misconfiguration difficult to debug.

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.}

cursor · 2026-02-24T20:56:46Z

verifiers/envs/experimental/rollout_gateway_mixin.py

+        state["completion"] = data.get("completion")
+        state["is_truncated"] = bool(
+            data.get("is_truncated", state.get("is_truncated", False))
+        )


Gateway trajectory fetch can erase prompt

Medium Severity

fetch_trajectory assigns state["prompt"] and state["completion"] directly from the gateway response, even when those fields are absent, which can replace valid initialized values with None. Downstream code (rendering, scoring, logging, or consumers expecting Messages) can then fail or behave inconsistently.

rasdani added 12 commits February 22, 2026 02:53

init

17e9c25

fix tunnel address

1e42cbd

bump health timeout to 120s

fcc2a83

fix(env): initialize state["completion"] in Environment.init_state to…

17fb3bd

… prevent rubric KeyError on early rollout failures

multiple tunnels

898196c

rename: RolloutGatewayEnv

9e2e735

revert CliAgentEnv

6dc0078

rename test

5d446d0

fix tests

afa4511

simplify

715f6fc

server side reponse models

b2d42b8

fix slop

5d8f422

cursor bot reviewed Feb 24, 2026

View reviewed changes

verifiers/envs/environment.py Show resolved Hide resolved

verifiers/envs/experimental/rollout_gateway_mixin.py Show resolved Hide resolved

verifiers/envs/experimental/rollout_gateway_mixin.py Show resolved Hide resolved

refactor + ty

899ce24

rasdani requested review from samsja and willccbb February 24, 2026 01:38

rasdani mentioned this pull request Feb 24, 2026

Add RolloutGateway to enable native CLI agent training PrimeIntellect-ai/prime-rl#1757

Open

cursor bot reviewed Feb 24, 2026

View reviewed changes

verifiers/envs/experimental/rollout_gateway_mixin.py Show resolved Hide resolved

verifiers/envs/experimental/rollout_gateway_mixin.py Show resolved Hide resolved

rasdani mentioned this pull request Feb 24, 2026

opencode_swe: deprecate InterceptionServer PrimeIntellect-ai/research-environments#162

Open

fix

d1eb28f

mikasenghaas reviewed Feb 24, 2026

View reviewed changes

refactor: RolloutGatewayMixin

d2301af

rasdani changed the title ~~add RolloutGatewayEnv for server-side rollout execution~~ add RolloutGatewayMixin for server-side rollout execution Feb 24, 2026

cancel properly

7a36c41

cursor bot reviewed Feb 24, 2026

View reviewed changes

		# `state["client"]` may be a Verifiers wrapper with the raw client on `.client`.
		client = getattr(state["client"], "client", state["client"])

Comments

Conversation

rasdani commented Feb 24, 2026 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of Change

Testing

Checklist

Additional Notes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

willccbb commented Feb 24, 2026

Uh oh!

willccbb commented Feb 24, 2026

Uh oh!

mikasenghaas left a comment

Choose a reason for hiding this comment

Uh oh!

mikasenghaas Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

mikasenghaas Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

mikasenghaas Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

mikasenghaas Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor bot Feb 24, 2026

Choose a reason for hiding this comment

Missing poll_interval attribute in RolloutGatewayMixin

Uh oh!

cursor bot Feb 24, 2026

Choose a reason for hiding this comment

post_rollout never called in gateway rollout path

Uh oh!

cursor bot Feb 24, 2026

Choose a reason for hiding this comment

Attributes assumed from parent without declaration

Uh oh!

cursor bot Feb 24, 2026

Choose a reason for hiding this comment

Missing validation causes cryptic crash if init_gateway not called

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor bot Feb 24, 2026

Choose a reason for hiding this comment

Gateway trajectory fetch can erase prompt

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

rasdani commented Feb 24, 2026 •

edited by cursor bot

Loading

Missing `poll_interval` attribute in `RolloutGatewayMixin`

`post_rollout` never called in gateway rollout path

Missing validation causes cryptic crash if `init_gateway` not called