PrimeIntellect-ai · willccbb · Feb 7, 2026 · Feb 2, 2026 · Feb 5, 2026 · Feb 5, 2026
diff --git a/docs/environments.md b/docs/environments.md
@@ -795,6 +795,6 @@ These require additional dependencies installed via extras (e.g., `uv add 'verif
 Newer and more experimental environment classes include:
 
 - **`GymEnv`** — universal runner for Gym-compatible environments (OpenAI Gym / Gymnasium API)
-- **`CliAgentEnv`** — runs custom agent code inside sandboxes, intercepting API requests. Accepts sandbox configuration parameters including `docker_image`, `cpu_cores`, `memory_gb`, `disk_size_gb`, `gpu_count`, `timeout_minutes`, `environment_vars`, and `labels` for sandbox categorization
+- **`CliAgentEnv`** — runs custom agent code inside sandboxes, intercepting API requests. Accepts sandbox configuration parameters including `docker_image`, `cpu_cores`, `memory_gb`, `disk_size_gb`, `gpu_count`, `timeout_minutes`, `environment_vars`, and `labels` for sandbox categorization. Also accepts retry tuning (like `max_retries`) and connection pooling ( like `sandbox_client_max_workers`) parameters via `SandboxMixin`
 - **`HarborEnv`** — loads Harbor-format agent benchmark tasks
 - **`RLMEnv`** — implements Recursive Language Models for unbounded context processing. Execution supports both local and sandbox backends via `execution_backend` (`"local"` default, `"sandbox"` to run the REPL inside a Prime Sandbox). Context is still filesystem-based: a provided `context_dir` is copied into the working directory, or legacy JSON-serializable `context` data is written to `context.json`/`context.txt`. The RLM scaffolding prompt (filesystem availability note, REPL workflow, tool docs) is injected into the first user message wrapped in `<RLM_SCAFFOLDING>...</RLM_SCAFFOLDING>`, preserving any external system prompt; the model-visible prompt is stored in `state["prompt"]`, while the original input prompt is preserved in `state["raw_prompt"]`. The REPL language is configurable via `repl_language` (default: `bash`); use `repl_language="python"` to retain the Python REPL. Bash mode uses `call_bash_repl` and behaves like a terminal; Python mode uses `call_python_repl`. Sub-LLM and root-tool interception for sandboxes is routed through a Prime Tunnel unless `interception_url` is provided. Tooling can be split via `tools` (shared), `root_tools` (REPL-only), and `sub_tools` (sub-LLM tools). Fixed root tools like `llm_batch` are always present and cannot be overridden. Tool ordering is fixed tools → shared tools → role-specific tools, with per-list deduplication by name. Root tools are callable only inside the REPL; sub-LLM tools use standard tool-calling. When using the sandbox backend, the sandbox and worker are started eagerly during `setup_state`, and package installs are skipped when the package is already importable in the image. Environments can pre-set `state["rlm_fs_root_remote"]` (and optionally `state["rlm_control_dir_remote"]`) before calling `super().setup_state` to point the worker at an existing filesystem path in the sandbox. For further customization, override `get_sandbox_request`, `on_sandbox_ready`, or `customize_worker_script` on `RLMEnv`.
diff --git a/environments/AGENTS.md b/environments/AGENTS.md
@@ -799,6 +799,6 @@ These require additional dependencies installed via extras (e.g., `uv add 'verif
 Newer and more experimental environment classes include:
 
 - **`GymEnv`** — universal runner for Gym-compatible environments (OpenAI Gym / Gymnasium API)
-- **`CliAgentEnv`** — runs custom agent code inside sandboxes, intercepting API requests. Accepts sandbox configuration parameters including `docker_image`, `cpu_cores`, `memory_gb`, `disk_size_gb`, `gpu_count`, `timeout_minutes`, `environment_vars`, and `labels` for sandbox categorization
+- **`CliAgentEnv`** — runs custom agent code inside sandboxes, intercepting API requests. Accepts sandbox configuration parameters including `docker_image`, `cpu_cores`, `memory_gb`, `disk_size_gb`, `gpu_count`, `timeout_minutes`, `environment_vars`, and `labels` for sandbox categorization. Also accepts retry tuning (like `max_retries`) and connection pooling ( like `sandbox_client_max_workers`) parameters via `SandboxMixin`
 - **`HarborEnv`** — loads Harbor-format agent benchmark tasks
-- **`RLMEnv`** — implements Recursive Language Models for unbounded context processing. Execution supports both local and sandbox backends via `execution_backend` (`"local"` default, `"sandbox"` to run the REPL inside a Prime Sandbox). Context is still filesystem-based: a provided `context_dir` is copied into the working directory, or legacy JSON-serializable `context` data is written to `context.json`/`context.txt`. The RLM scaffolding prompt (filesystem availability note, REPL workflow, tool docs) is injected into the first user message wrapped in `<RLM_SCAFFOLDING>...</RLM_SCAFFOLDING>`, preserving any external system prompt; the model-visible prompt is stored in `state["prompt"]`, while the original input prompt is preserved in `state["raw_prompt"]`. The REPL language is configurable via `repl_language` (default: `bash`); use `repl_language="python"` to retain the Python REPL. Bash mode uses `call_bash_repl` and behaves like a terminal; Python mode uses `call_python_repl`. Sub-LLM and root-tool interception for sandboxes is routed through a Prime Tunnel unless `interception_url` is provided. Tooling can be split via `tools` (shared), `root_tools` (REPL-only), and `sub_tools` (sub-LLM tools). Fixed root tools like `llm_batch` are always present and cannot be overridden. Tool ordering is fixed tools → shared tools → role-specific tools, with per-list deduplication by name. Root tools are callable only inside the REPL; sub-LLM tools use standard tool-calling.
+- **`RLMEnv`** — implements Recursive Language Models for unbounded context processing. Execution supports both local and sandbox backends via `execution_backend` (`"local"` default, `"sandbox"` to run the REPL inside a Prime Sandbox). Context is still filesystem-based: a provided `context_dir` is copied into the working directory, or legacy JSON-serializable `context` data is written to `context.json`/`context.txt`. The RLM scaffolding prompt (filesystem availability note, REPL workflow, tool docs) is injected into the first user message wrapped in `<RLM_SCAFFOLDING>...</RLM_SCAFFOLDING>`, preserving any external system prompt; the model-visible prompt is stored in `state["prompt"]`, while the original input prompt is preserved in `state["raw_prompt"]`. The REPL language is configurable via `repl_language` (default: `bash`); use `repl_language="python"` to retain the Python REPL. Bash mode uses `call_bash_repl` and behaves like a terminal; Python mode uses `call_python_repl`. Sub-LLM and root-tool interception for sandboxes is routed through a Prime Tunnel unless `interception_url` is provided. Tooling can be split via `tools` (shared), `root_tools` (REPL-only), and `sub_tools` (sub-LLM tools). Fixed root tools like `llm_batch` are always present and cannot be overridden. Tool ordering is fixed tools → shared tools → role-specific tools, with per-list deduplication by name. Root tools are callable only inside the REPL; sub-LLM tools use standard tool-calling. When using the sandbox backend, the sandbox and worker are started eagerly during `setup_state`, and package installs are skipped when the package is already importable in the image. Environments can pre-set `state["rlm_fs_root_remote"]` (and optionally `state["rlm_control_dir_remote"]`) before calling `super().setup_state` to point the worker at an existing filesystem path in the sandbox. For further customization, override `get_sandbox_request`, `on_sandbox_ready`, or `customize_worker_script` on `RLMEnv`.
diff --git a/environments/opencode_harbor/opencode_harbor.py b/environments/opencode_harbor/opencode_harbor.py
@@ -1,3 +1,4 @@
+import json
 import logging
 from pathlib import Path
 
@@ -6,48 +7,74 @@
 logger = logging.getLogger("verifiers.envs.OpenCodeHarborEnv")
 
 
-def _build_run_command(agent_workdir: str) -> str:
+def _build_opencode_config(
+    disabled_tools: list[str] | None = None,
+    system_prompt_path: str | None = None,
+) -> str:
+    config: dict = {
+        "${SCHEMA_DOLLAR}schema": "https://opencode.ai/config.json",
+        "provider": {
+            "intercepted": {
+                "npm": "@ai-sdk/openai-compatible",
+                "name": "Intercepted",
+                "options": {
+                    "baseURL": "$OPENAI_BASE_URL",
+                    "apiKey": "intercepted",
+                    "timeout": 600000,
+                },
+                "models": {
+                    "model": {
+                        "name": "Intercepted Model",
+                        "modalities": {"input": ["text", "image"], "output": ["text"]},
+                    }
+                },
+            }
+        },
+        "model": "intercepted/model",
+    }
+
+    # Add agent config if we have custom prompt or disabled tools
+    if system_prompt_path or disabled_tools:
+        build_config: dict = {}
+
+        if system_prompt_path:
+            build_config["prompt"] = "{file:" + system_prompt_path + "}"
+
+        if disabled_tools:
+            build_config["tools"] = {tool: False for tool in disabled_tools}
+
+        config["agent"] = {"build": build_config}
+
+    return json.dumps(config, indent=2)
+
+
+def _build_run_command(
+    agent_workdir: str,
+    disabled_tools: list[str] | None = None,
+    has_system_prompt: bool = False,
+) -> str:
+    # Path where we'll upload the system prompt in the sandbox
+    system_prompt_sandbox_path = "/opencode/prompt.txt" if has_system_prompt else None
+    config_json = _build_opencode_config(disabled_tools, system_prompt_sandbox_path)
+
     return f"""
 set -e
 
-echo "Starting OpenCode agent..."
-echo "Base URL: $OPENAI_BASE_URL"
-
 apt-get update && apt-get install -y curl
 
-# TODO: Add opencode to prebuilt images so we don't need to install at runtime
 curl -fsSL https://opencode.ai/install | bash
 export PATH="$HOME/.opencode/bin:$PATH"
 
 # Create opencode config directory
 mkdir -p ~/.config/opencode
 
-# Create opencode.json config with intercepted provider
+# Preserve JSON schema key literal in unquoted heredoc while still expanding
+# OPENAI_BASE_URL.
+SCHEMA_DOLLAR='$'
+
+# Create opencode.json config
 cat > ~/.config/opencode/opencode.json << EOFCONFIG
-{{
-  "\\$schema": "https://opencode.ai/config.json",
-  "provider": {{
-    "intercepted": {{
-      "npm": "@ai-sdk/openai-compatible",
-      "name": "Intercepted",
-      "options": {{
-        "baseURL": "$OPENAI_BASE_URL",
-        "apiKey": "intercepted",
-        "timeout": 600000
-      }},
-      "models": {{
-        "model": {{
-          "name": "Intercepted Model",
-          "modalities": {{
-            "input": ["text", "image"],
-            "output": ["text"]
-          }}
-        }}
-      }}
-    }}
-  }},
-  "model": "intercepted/model"
-}}
+{config_json}
 EOFCONFIG
 
 mkdir -p /logs/agent
@@ -65,23 +92,55 @@ def __init__(
         tasks: list[str] | None = None,
         agent_workdir: str = "/app",
         docker_image: str = "python:3.11-slim",
+        system_prompt_path: str | Path | None = None,
+        disabled_tools: list[str] | None = None,
         **kwargs,
     ):
+        self.system_prompt_path = (
+            Path(system_prompt_path) if system_prompt_path else None
+        )
+        self.disabled_tools = disabled_tools
+
         super().__init__(
-            run_command=_build_run_command(agent_workdir),
+            run_command=_build_run_command(
+                agent_workdir,
+                disabled_tools=disabled_tools,
+                has_system_prompt=system_prompt_path is not None,
+            ),
             dataset_path=dataset_path,
             tasks=tasks,
             agent_workdir=agent_workdir,
             docker_image=docker_image,
             **kwargs,
         )
 
+    async def post_sandbox_setup(self, state) -> None:
+        """Upload Harbor task assets and optional system prompt after sandbox creation."""
+        await super().post_sandbox_setup(state)
+
+        if self.system_prompt_path:
+            if not self.system_prompt_path.exists():
+                raise FileNotFoundError(
+                    f"System prompt file not found: {self.system_prompt_path}"
+                )
+
+            sandbox_id = state["sandbox_id"]
+            await self.sandbox_client.execute_command(
+                sandbox_id, "mkdir -p /opencode", working_dir=None
+            )
+            await self.sandbox_client.upload_file(
+                sandbox_id, "/opencode/prompt.txt", str(self.system_prompt_path)
+            )
+            logger.info(f"Uploaded system prompt from {self.system_prompt_path}")
+
 
 def load_environment(
     dataset_path: str | Path = Path(__file__).parent / "tasks",
     tasks: list[str] | None = None,
     agent_workdir: str = "/app",
     docker_image: str = "python:3.11-slim",
+    system_prompt_path: str | Path | None = Path(__file__).parent / "prompt.txt",
+    disabled_tools: list[str] | None = ["webfetch", "question"],
     timeout_seconds: float = 900.0,
     cpu_cores: int = 2,
     memory_gb: int = 4,
@@ -94,6 +153,8 @@ def load_environment(
         tasks=tasks,
         agent_workdir=agent_workdir,
         docker_image=docker_image,
+        system_prompt_path=system_prompt_path,
+        disabled_tools=disabled_tools,
         timeout_seconds=timeout_seconds,
         cpu_cores=cpu_cores,
         memory_gb=memory_gb,