PrimeIntellect-ai · willccbb · Jan 23, 2026 · Jan 23, 2026 · Jan 23, 2026
diff --git a/docs/environments.md b/docs/environments.md
@@ -490,6 +490,18 @@ The model sees `run_code(code: str)` in its tool schema, but the environment inj
 
 Verifiers includes several built-in stateful environment classes: `SandboxEnv` provides a containerized bash shell, and `PythonEnv` extends it with a persistent Python REPL (both of which are configured for use with Prime Intellect's [Sandboxes](https://docs.primeintellect.ai/sandboxes/overview)). These handle sandbox lifecycle management automatically.
 
+Both `SandboxEnv` and `CliAgentEnv` accept a `labels` parameter for tagging sandboxes:
+
+```python
+env = vf.SandboxEnv(
+    dataset=dataset,
+    rubric=rubric,
+    labels=["experiment-1", "math-tasks"],  # optional labels for sandbox categorization
+)
+```
+
+Labels are passed to the Prime Sandboxes API and can be used for organizing, filtering, and managing sandboxes across experiments or training runs.
+
 Stateful environments often define methods decorated with `@vf.cleanup` (called after each rollout) or `@vf.teardown` (called once at environment shutdown) for resource management. These decorators, along with `@vf.stop` for custom stop conditions (boolean functions checked after each turn), are powerful tools for rollout lifecycle control in custom `MultiTurnEnv` subclasses.
 
 ## Custom Multi-Turn Environments
@@ -758,6 +770,6 @@ These require additional dependencies installed via extras (e.g., `uv add 'verif
 Newer and more experimental environment classes include:
 
 - **`GymEnv`** — universal runner for Gym-compatible environments (OpenAI Gym / Gymnasium API)
-- **`CliAgentEnv`** — runs custom agent code inside sandboxes, intercepting API requests
+- **`CliAgentEnv`** — runs custom agent code inside sandboxes, intercepting API requests. Accepts sandbox configuration parameters including `docker_image`, `cpu_cores`, `memory_gb`, `disk_size_gb`, `gpu_count`, `timeout_minutes`, `environment_vars`, and `labels` for sandbox categorization
 - **`HarborEnv`** — loads Harbor-format agent benchmark tasks
 - **`RLMEnv`** — implements Recursive Language Models for unbounded context processing. Execution is local-only and uses a filesystem-based context: a provided `context_dir` is copied into the working directory, or legacy JSON-serializable `context` data is written to `context.json`/`context.txt`. The RLM scaffolding prompt (filesystem summary, REPL workflow, tool docs) is injected into the first user message wrapped in `<RLM_SCAFFOLDING>...</RLM_SCAFFOLDING>`, preserving any external system prompt. The REPL language is configurable via `repl_language` (default: `bash`); use `repl_language="python"` to retain the Python REPL. Bash mode uses `call_bash_repl` and behaves like a terminal; Python mode uses `call_python_repl` with the best-effort filesystem jail that restricts access to the working directory. Customize additional guardrails via `disallowed_modules`/`disallowed_builtins`. Tooling can be split via `tools` (shared), `root_tools` (REPL-only), and `sub_tools` (sub-LLM tools). Fixed root tools like `llm_batch` are always present and cannot be overridden. Tool ordering is fixed tools → shared tools → role-specific tools, with per-list deduplication by name. Root tools are callable only inside the REPL; sub-LLM tools use standard tool-calling.
diff --git a/docs/reference.md b/docs/reference.md
@@ -356,8 +356,44 @@ Tools requiring per-rollout state. Override `setup_state` and `update_tool_args`
 
 #### SandboxEnv
 
+```python
+class SandboxEnv(StatefulToolEnv):
+    def __init__(
+        self,
+        sandbox_name: str = "sandbox-env",
+        docker_image: str = "python:3.11-slim",
+        start_command: str = "tail -f /dev/null",
+        cpu_cores: int = 1,
+        memory_gb: int = 2,
+        disk_size_gb: int = 5,
+        gpu_count: int = 0,
+        timeout_minutes: int = 60,
+        timeout_per_command_seconds: int = 30,
+        environment_vars: dict[str, str] | None = None,
+        team_id: str | None = None,
+        advanced_configs: AdvancedConfigs | None = None,
+        labels: list[str] | None = None,
+        **kwargs,
+    ): ...
+```
+
 Sandboxed container execution using `prime` sandboxes.
 
+**Key parameters:**
+
+| Parameter | Type | Description |
+|-----------|------|-------------|
+| `sandbox_name` | `str` | Name prefix for sandbox instances |
+| `docker_image` | `str` | Docker image to use for the sandbox |
+| `cpu_cores` | `int` | Number of CPU cores |
+| `memory_gb` | `int` | Memory allocation in GB |
+| `disk_size_gb` | `int` | Disk size in GB |
+| `gpu_count` | `int` | Number of GPUs |
+| `timeout_minutes` | `int` | Sandbox timeout in minutes |
+| `timeout_per_command_seconds` | `int` | Per-command execution timeout |
+| `environment_vars` | `dict[str, str] \| None` | Environment variables to set in sandbox |
+| `labels` | `list[str] \| None` | Labels for sandbox categorization and filtering |
+
 #### PythonEnv
 
 Persistent Python REPL in sandbox. Extends `SandboxEnv`.

diff --git a/verifiers/envs/experimental/cli_agent_env.py b/verifiers/envs/experimental/cli_agent_env.py
@@ -62,6 +62,7 @@ def __init__(
         environment_vars: dict[str, str] | None = None,
         team_id: str | None = None,
         advanced_configs: AdvancedConfigs | None = None,
+        labels: list[str] | None = None,
         **kwargs,
     ):
         super().__init__(max_turns=max_turns, message_type="chat", **kwargs)
@@ -82,6 +83,7 @@ def __init__(
         self.environment_vars = environment_vars
         self.team_id = team_id
         self.advanced_configs = advanced_configs
+        self.labels = labels
         self.active_rollouts: dict[str, dict[str, Any]] = {}
         self.intercepts: dict[str, dict[str, Any]] = {}  # request_id -> intercept data
         self.interception_server: Any = None
@@ -142,6 +144,7 @@ async def setup_state(self, state: State) -> State:
             environment_vars=env_vars,
             team_id=self.team_id,
             advanced_configs=self.advanced_configs,
+            labels=self.labels if self.labels else [],
         )
         logger.debug(
             f"Creating sandbox with OPENAI_BASE_URL={env_vars.get('OPENAI_BASE_URL')} "

diff --git a/verifiers/envs/sandbox_env.py b/verifiers/envs/sandbox_env.py
@@ -132,6 +132,7 @@ def __init__(
         environment_vars: dict[str, str] | None = None,
         team_id: str | None = None,
         advanced_configs: AdvancedConfigs | None = None,
+        labels: list[str] | None = None,
         max_retries: int = 5,
         base_delay: float = 0.5,
         backoff_factor: float = 2.0,
@@ -166,6 +167,7 @@ def __init__(
             environment_vars=environment_vars,
             team_id=team_id,
             advanced_configs=advanced_configs,
+            labels=labels if labels else [],
         )
         self.active_sandboxes = set()
         self.with_retry = tc.AsyncRetrying(