Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 13 additions & 1 deletion docs/environments.md
Original file line number Diff line number Diff line change
Expand Up @@ -490,6 +490,18 @@ The model sees `run_code(code: str)` in its tool schema, but the environment inj

Verifiers includes several built-in stateful environment classes: `SandboxEnv` provides a containerized bash shell, and `PythonEnv` extends it with a persistent Python REPL (both of which are configured for use with Prime Intellect's [Sandboxes](https://docs.primeintellect.ai/sandboxes/overview)). These handle sandbox lifecycle management automatically.

Both `SandboxEnv` and `CliAgentEnv` accept a `labels` parameter for tagging sandboxes:

```python
env = vf.SandboxEnv(
dataset=dataset,
rubric=rubric,
labels=["experiment-1", "math-tasks"], # optional labels for sandbox categorization
)
```

Labels are passed to the Prime Sandboxes API and can be used for organizing, filtering, and managing sandboxes across experiments or training runs.

Stateful environments often define methods decorated with `@vf.cleanup` (called after each rollout) or `@vf.teardown` (called once at environment shutdown) for resource management. These decorators, along with `@vf.stop` for custom stop conditions (boolean functions checked after each turn), are powerful tools for rollout lifecycle control in custom `MultiTurnEnv` subclasses.

## Custom Multi-Turn Environments
Expand Down Expand Up @@ -758,6 +770,6 @@ These require additional dependencies installed via extras (e.g., `uv add 'verif
Newer and more experimental environment classes include:

- **`GymEnv`** — universal runner for Gym-compatible environments (OpenAI Gym / Gymnasium API)
- **`CliAgentEnv`** — runs custom agent code inside sandboxes, intercepting API requests
- **`CliAgentEnv`** — runs custom agent code inside sandboxes, intercepting API requests. Accepts sandbox configuration parameters including `docker_image`, `cpu_cores`, `memory_gb`, `disk_size_gb`, `gpu_count`, `timeout_minutes`, `environment_vars`, and `labels` for sandbox categorization
- **`HarborEnv`** — loads Harbor-format agent benchmark tasks
- **`RLMEnv`** — implements Recursive Language Models for unbounded context processing. Execution is local-only and uses a filesystem-based context: a provided `context_dir` is copied into the working directory, or legacy JSON-serializable `context` data is written to `context.json`/`context.txt`. The RLM scaffolding prompt (filesystem summary, REPL workflow, tool docs) is injected into the first user message wrapped in `<RLM_SCAFFOLDING>...</RLM_SCAFFOLDING>`, preserving any external system prompt. The REPL language is configurable via `repl_language` (default: `bash`); use `repl_language="python"` to retain the Python REPL. Bash mode uses `call_bash_repl` and behaves like a terminal; Python mode uses `call_python_repl` with the best-effort filesystem jail that restricts access to the working directory. Customize additional guardrails via `disallowed_modules`/`disallowed_builtins`. Tooling can be split via `tools` (shared), `root_tools` (REPL-only), and `sub_tools` (sub-LLM tools). Fixed root tools like `llm_batch` are always present and cannot be overridden. Tool ordering is fixed tools → shared tools → role-specific tools, with per-list deduplication by name. Root tools are callable only inside the REPL; sub-LLM tools use standard tool-calling.
36 changes: 36 additions & 0 deletions docs/reference.md
Original file line number Diff line number Diff line change
Expand Up @@ -356,8 +356,44 @@ Tools requiring per-rollout state. Override `setup_state` and `update_tool_args`

#### SandboxEnv

```python
class SandboxEnv(StatefulToolEnv):
def __init__(
self,
sandbox_name: str = "sandbox-env",
docker_image: str = "python:3.11-slim",
start_command: str = "tail -f /dev/null",
cpu_cores: int = 1,
memory_gb: int = 2,
disk_size_gb: int = 5,
gpu_count: int = 0,
timeout_minutes: int = 60,
timeout_per_command_seconds: int = 30,
environment_vars: dict[str, str] | None = None,
team_id: str | None = None,
advanced_configs: AdvancedConfigs | None = None,
labels: list[str] | None = None,
**kwargs,
): ...
```

Sandboxed container execution using `prime` sandboxes.

**Key parameters:**

| Parameter | Type | Description |
|-----------|------|-------------|
| `sandbox_name` | `str` | Name prefix for sandbox instances |
| `docker_image` | `str` | Docker image to use for the sandbox |
| `cpu_cores` | `int` | Number of CPU cores |
| `memory_gb` | `int` | Memory allocation in GB |
| `disk_size_gb` | `int` | Disk size in GB |
| `gpu_count` | `int` | Number of GPUs |
| `timeout_minutes` | `int` | Sandbox timeout in minutes |
| `timeout_per_command_seconds` | `int` | Per-command execution timeout |
| `environment_vars` | `dict[str, str] \| None` | Environment variables to set in sandbox |
| `labels` | `list[str] \| None` | Labels for sandbox categorization and filtering |

#### PythonEnv

Persistent Python REPL in sandbox. Extends `SandboxEnv`.
Expand Down
3 changes: 3 additions & 0 deletions verifiers/envs/experimental/cli_agent_env.py
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,7 @@ def __init__(
environment_vars: dict[str, str] | None = None,
team_id: str | None = None,
advanced_configs: AdvancedConfigs | None = None,
labels: list[str] | None = None,
**kwargs,
):
super().__init__(max_turns=max_turns, message_type="chat", **kwargs)
Expand All @@ -82,6 +83,7 @@ def __init__(
self.environment_vars = environment_vars
self.team_id = team_id
self.advanced_configs = advanced_configs
self.labels = labels
self.active_rollouts: dict[str, dict[str, Any]] = {}
self.intercepts: dict[str, dict[str, Any]] = {} # request_id -> intercept data
self.interception_server: Any = None
Expand Down Expand Up @@ -142,6 +144,7 @@ async def setup_state(self, state: State) -> State:
environment_vars=env_vars,
team_id=self.team_id,
advanced_configs=self.advanced_configs,
labels=self.labels if self.labels else [],
)
logger.debug(
f"Creating sandbox with OPENAI_BASE_URL={env_vars.get('OPENAI_BASE_URL')} "
Expand Down
2 changes: 2 additions & 0 deletions verifiers/envs/sandbox_env.py
Original file line number Diff line number Diff line change
Expand Up @@ -132,6 +132,7 @@ def __init__(
environment_vars: dict[str, str] | None = None,
team_id: str | None = None,
advanced_configs: AdvancedConfigs | None = None,
labels: list[str] | None = None,
max_retries: int = 5,
base_delay: float = 0.5,
backoff_factor: float = 2.0,
Expand Down Expand Up @@ -166,6 +167,7 @@ def __init__(
environment_vars=environment_vars,
team_id=team_id,
advanced_configs=advanced_configs,
labels=labels if labels else [],
)
self.active_sandboxes = set()
self.with_retry = tc.AsyncRetrying(
Expand Down
Loading