Skip to content

Comments

add RolloutGatewayMixin for server-side rollout execution#954

Open
rasdani wants to merge 16 commits intomainfrom
daniel/vllm-interception
Open

add RolloutGatewayMixin for server-side rollout execution#954
rasdani wants to merge 16 commits intomainfrom
daniel/vllm-interception

Conversation

@rasdani
Copy link
Contributor

@rasdani rasdani commented Feb 24, 2026

Description

Adds RolloutGatewayMixin, a new experimental environment that offloads agent execution to sandboxes that talk directly to the vLLM rollout gateway.

The env handles sandbox lifecycle, rollout registration/unregistration, trajectory fetch, and reward — the sandboxed agent drives its own inference loop through a prime tunnel URL.

depends on PrimeIntellect-ai/prime-rl#1757
example env PrimeIntellect-ai/research-environments#162

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Test improvement

Testing

  • All existing tests pass when running uv run pytest locally.
  • New tests have been added to cover the changes

Checklist

  • My code follows the style guidelines of this project as outlined in AGENTS.md
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • Any dependent changes have been merged and published

Additional Notes


Note

Medium Risk
Touches rollout execution/cleanup paths and adds new network/tunnel-driven control flow; failures could leak resources or change rollout outputs for envs opting into the mixin.

Overview
Adds experimental RolloutGatewayMixin to run CliAgentEnv rollouts via a server-side vLLM rollout gateway: the env registers/unregisters rollouts over HTTP, opens/maintains Prime tunnels per gateway host, runs the agent in a sandbox pointing OPENAI_BASE_URL at the tunneled /v1/rollouts/{id} endpoint, then fetches the final trajectory.

Refactors CliAgentEnv to initialize interception/tunnel resources via init_interception() and hardens cleanup/teardown to be safe when those resources weren’t created. State initialization now sets state["completion"] = None up front to support gateway-populated completions, and new integration tests validate gateway registration payloads, trajectory fetching, tunnel reuse, and teardown behavior.

Written by Cursor Bugbot for commit 7a36c41. This will update automatically on new commits. Configure here.

@willccbb
Copy link
Member

Hm what's the use case here? Would prefer we don't hardcode in vLLM-specific assumptions at the env pattern level, can we make this more endpoint-agnostic?

@willccbb
Copy link
Member

if it truly needs to be baked in to the class, would be better if it's something like a mixin and then toggled on/off via an env-level class attribute, so that eval model is still supported with other endpoints.

Copy link
Member

@mikasenghaas mikasenghaas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice, this looks pretty clean!

Comment on lines 44 to 68
run_command: str,
gateway_port: int = 8000,
max_turns: int = -1,
timeout_seconds: float = 3600.0,
poll_interval: float = 2.0,
docker_image: str = "python:3.11-slim",
start_command: str = "tail -f /dev/null",
cpu_cores: int = 1,
memory_gb: int = 2,
disk_size_gb: int = 5,
gpu_count: int = 0,
timeout_minutes: int = 60,
environment_vars: dict[str, str] | None = None,
team_id: str | None = None,
advanced_configs: AdvancedConfigs | None = None,
labels: list[str] | None = None,
max_retries: int = 5,
base_delay: float = 0.5,
backoff_factor: float = 2.0,
max_backoff_seconds: float = 30.0,
jitter: float = 1e-3,
sandbox_client_max_workers: int = 10,
sandbox_client_max_connections: int = 100,
sandbox_client_max_keepalive_connections: int = 50,
sandbox_wait_for_creation_max_attempts: int = 120,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe nicer to have sandbox_mixin_kwargs so that we dont have to maintain constructor signatures in both places?

Comment on lines 147 to 148
# `state["client"]` may be a Verifiers wrapper with the raw client on `.client`.
client = getattr(state["client"], "client", state["client"])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't it always a vf client?

Comment on lines +149 to +152
gateway_url = str(client.base_url).rstrip("/")
if gateway_url.endswith("/v1"):
gateway_url = gateway_url[:-3]
return gateway_url
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at this point we shoul make this a util haha, i feel like ive written that exact code like 3x

async def rollout(
self,
input: RolloutInput,
client: Client | ClientConfig,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't the signature just client: Client?

@rasdani rasdani changed the title add RolloutGatewayEnv for server-side rollout execution add RolloutGatewayMixin for server-side rollout execution Feb 24, 2026
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 4 potential issues.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

f"stage=agent_completed exit_code={status.exit_code}"
)
return
await asyncio.sleep(self.poll_interval)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing poll_interval attribute in RolloutGatewayMixin

High Severity

The poll_job_completion method uses self.poll_interval but RolloutGatewayMixin never initializes this attribute. The mixin assumes poll_interval exists from the parent class CliAgentEnv, but relies on duck typing without guaranteeing the attribute is present, causing AttributeError if used with a parent class that doesn't define it.

Fix in Cursor Fix in Web

f"agent_exit_code={state.get('agent_exit_code')} error={error_name}"
)

return state
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

post_rollout never called in gateway rollout path

Medium Severity

The gateway rollout method bypasses the normal _cleanup handlers and never calls post_rollout, breaking subclasses like the test's GatewayCliAgentEnv that override post_rollout to compute rewards. The cleanup at line 373 calls self._cleanup(state) which invokes decorated @vf.cleanup handlers, but CliAgentEnv.destroy_sandbox (which calls post_rollout at line 473) is a cleanup handler that expects to run in the normal rollout flow, not in this custom gateway path.

Fix in Cursor Fix in Web

"model": state["model"],
"sampling_params": sampling_params,
"max_turns": self.max_turns,
"max_seq_len": self.max_seq_len,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Attributes assumed from parent without declaration

Low Severity

The mixin directly accesses multiple attributes from parent classes like self.environment_vars, self.max_turns, self.max_seq_len, self.sandbox_client, self.run_command, self.start_command, etc. without declaring them or documenting the contract. This creates tight coupling and makes the mixin fragile if parent classes change, contradicting the PR discussion's concern about avoiding hardcoded assumptions.

Fix in Cursor Fix in Web


rollout_registered = False
try:
await self.register_rollout(state)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing validation causes cryptic crash if init_gateway not called

Medium Severity

When use_gateway=True but init_gateway() was never called, the rollout method crashes with a cryptic AttributeError for _gw_http_client when trying to register the rollout. No validation checks whether gateway resources were initialized before attempting to use them, making misconfiguration difficult to debug.

Fix in Cursor Fix in Web

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

state["completion"] = data.get("completion")
state["is_truncated"] = bool(
data.get("is_truncated", state.get("is_truncated", False))
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gateway trajectory fetch can erase prompt

Medium Severity

fetch_trajectory assigns state["prompt"] and state["completion"] directly from the gateway response, even when those fields are absent, which can replace valid initialized values with None. Downstream code (rendering, scoring, logging, or consumers expecting Messages) can then fail or behave inconsistently.

Fix in Cursor Fix in Web

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants