Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 13 additions & 0 deletions api/oss/src/utils/env.py
Original file line number Diff line number Diff line change
Expand Up @@ -561,6 +561,18 @@ class DaytonaConfig(BaseModel):
model_config = ConfigDict(extra="ignore")


# ---------------------------------------------------------------------------
# e2b
# ---------------------------------------------------------------------------


class E2BConfig(BaseModel):
api_key: str | None = os.getenv("E2B_API_KEY")
template: str | None = os.getenv("E2B_TEMPLATE")

model_config = ConfigDict(extra="ignore")


# ---------------------------------------------------------------------------
# docker
# ---------------------------------------------------------------------------
Expand Down Expand Up @@ -1239,6 +1251,7 @@ class EnvironSettings(BaseModel):
crisp: CrispConfig = CrispConfig()
daytona: DaytonaConfig = DaytonaConfig()
docker: DockerConfig = DockerConfig()
e2b: E2BConfig = E2BConfig()
identity: IdentityConfig = IdentityConfig()
llm: LLMConfig = LLMConfig()
loops: LoopsConfig = LoopsConfig()
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -166,14 +166,21 @@ These are the agent-relevant variables. The example file lists them commented ou
`http://sandbox-agent:8765`. When unset, the Python service spawns the runner CLI locally
instead (see `runner_url` and `select_backend` in `services/oss/src/agent/`).
- `AGENTA_AGENT_ENABLE_MCP`. Gates MCP server resolution. Default `false`.
- `SANDBOX_AGENT_PROVIDER`. `local` or `daytona`. Default `local`.
- `SANDBOX_AGENT_PROVIDER`. `local`, `daytona`, or `e2b`. Default `local`.
- `SANDBOX_AGENT_DAYTONA_API_KEY`, `_API_URL`, `_TARGET`, `_SNAPSHOT`, `_IMAGE`,
`_INSTALL_PI`. Daytona credentials the runner reads for the `daytona` sandbox provider.
- `SANDBOX_AGENT_DAYTONA_AUTOSTOP_MINUTES`. Idle minutes before Daytona auto-stops a sandbox.
Default `15`. Leak backstop: the create object pairs `ephemeral` (auto-delete on stop) with
this non-zero auto-stop so a sandbox the runner leaks (a process KILL skips the per-run
teardown) self-reaps instead of burning credit. Values below `1` fall back to the default
(a `0` would re-disable auto-stop and reintroduce the leak).
- `E2B_API_KEY`. E2B API key (required for `sandbox="e2b"`). Also exposed as `env.e2b.api_key`.
- `E2B_TEMPLATE`. E2B template name. Default `agenta-sandbox-agent`. Build with
`npx @e2b/cli template create agenta-sandbox-agent -d sandbox-images/e2b/e2b.Dockerfile`.
- `E2B_TIMEOUT_MS`. E2B sandbox timeout in milliseconds. Default `1800000` (30 min). Leak
backstop: E2B auto-kills a sandbox at its timeout so a process-KILL-leaked sandbox self-reaps.
A restricted `network` policy on E2B is refused (the `sandbox-agent/e2b` provider exposes no
egress control); use `daytona` for enforced network boundaries.

The `sandbox-agent` container deliberately has no `env_file`. The harness sandbox must not
inherit the stack's secrets. The compose block comments explain this
Expand Down
102 changes: 102 additions & 0 deletions docs/design/agent-workflows/projects/add-sandbox-e2b/research.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,102 @@
# Add the E2B sandbox (running Pi) — investigation

## Goal (this worktree only)

Make `sandbox="e2b"` a selectable sandbox provider, proven by running the **Pi** harness on
it. One new variable: the sandbox. The harness is held constant at Pi — the most-supported
harness, which already owns all the remote-sandbox asset-prep code (it is the Daytona
reference path). Codex/opencode/Claude on E2B are out of scope (later matrix-fill); they need
the non-Pi remote-bootstrap generalization we are deferring.

## The seam (sandbox axis)

Sandbox selection is a thin provider switch in the Node runner — there is no provider class
hierarchy, just a pattern-match. `sandbox` is a loose string on the wire (no Python enum), so
extending the set is largely runner-side + env config.

```
buildRunPlan: sandboxId = request.sandbox || SANDBOX_AGENT_PROVIDER || "local" (run-plan.ts:153)
isDaytona = sandboxId === "daytona" (run-plan.ts:179)
buildSandboxProvider(sandboxId, env, binary, piExtEnv, secrets, perm): (provider.ts)
if (sandboxId === "daytona") return daytona({...})
return local({ env, binaryPath, log }) ← fallback
```

## The big finding: `sandbox-agent` already exports an e2b provider

`sandbox-agent@0.4.2` ships `sandbox-agent/e2b` (alongside `local`, `daytona`, docker, vercel,
cloudflare, modal, computesdk, sprites). So the provider runtime exists — E2B is **wiring an
existing export**, not building a provider. The rivet daemon (which carries the harnesses and
the `ensure_installed` auto-install) runs inside the E2B sandbox the same way it does on
Daytona; only the provisioning/lifecycle API differs.

## What is Daytona-shaped today and needs an E2B sibling

The remote-sandbox path has three Daytona-specific pieces. For **Pi on E2B** we replicate the
Pi-relevant ones; we do NOT need to generalize non-Pi bootstrap (that is the deferred matrix-fill).

| Piece | Daytona today | E2B action (Pi-only) |
|---|---|---|
| Provider construction | `daytona({ image, create })` w/ `buildDaytonaCreate` (snapshot, autostop, ephemeral, network fields, envVars) | `e2b({...})` from `sandbox-agent/e2b` with the equivalent create/env + network policy |
| cwd | `defaultDaytonaCwd()` `/home/sandbox/agenta-<hex>` (run-plan.ts:138) | `defaultE2bCwd()` (E2B's user home) |
| Asset-prep (Pi) | `prepareDaytonaPiAssets` (daytona.ts): install `pi`, upload auth/extension/skills/system-prompt | E2B sibling using E2B's fs/process API; reuse `pi-assets.ts` uploaders which take a `sandbox` handle |
| Auth transport | `createCookieFetch` (Daytona preview-proxy cookie) | plain `createAcpFetch` — E2B uses an `E2B_API_KEY` + a per-sandbox host, no preview-proxy cookie jar; the PoC connected with `SandboxAgent.connect({baseUrl})` and no cookie. Confirm at impl. |
| Network policy | `daytonaNetworkFields` → `networkBlockAll`/`networkAllowList` | **E2B exposes NO egress block/allow in the `sandbox-agent/e2b` wrapper** (PoC: E2B egress is open by default; it relied on that for its tunnel). → refuse restricted-network E2B under strict, the way local is gated. |
| Image/snapshot | `rivetdev/sandbox-agent:<tag>-full` + baked `pi` (`build_snapshot.py`) | a **baked E2B template** (daemon + pi), via `E2BProviderOptions.template`. The PoC built one named `agenta-sandbox-agent`. Do what Daytona does. |

## What is held constant (Pi)

- Pi's local + Daytona asset-prep, extension, usage file, system-prompt handling, skills
materialization — all already exist and are Pi-shaped. We point them at the E2B handle.
- Tool delivery: Pi uses its native extension + the file relay; the relay already works on
remote (`sandboxRelayHost`). Reuse as-is.
- Tracing: Pi self-instruments via the extension under the propagated traceparent (works
remote on Daytona today; same on E2B).

## The `sandbox-agent/e2b` provider surface (verified from the installed types)

`e2b(options)` accepts: `create` (passthrough to E2B `SandboxBetaCreateOpts`), `connect`,
`template` (string name or resolver), `agentPort`, `timeoutMs`, `autoPause`. So template
selection, the agent port, and the lifecycle timeout are all first-class — no `as any`
needed (unlike Daytona's create-field cast).

## Resolved (Phase 0 confirmed — `sandbox-agent/e2b` types + the green PoC matrix)

The PoC ran **Pi (and all 4 harnesses) green on E2B** (template `agenta-sandbox-agent`).

1. **Auth/connection — `E2B_API_KEY` + plain `createAcpFetch`** (no preview-proxy cookie). E2B
gives a per-sandbox host; the PoC connected with `SandboxAgent.connect({baseUrl})`, no cookie
jar. Confirm at impl, but do NOT port `createCookieFetch`.
2. **Network egress — refuse restricted-network under strict.** Mirror what Daytona does unless
E2B forces a mandatory new mechanism — and it doesn't: the `sandbox-agent/e2b` wrapper exposes no egress
block/allow, and E2B is open-egress by default. So mirror the LOCAL gate
(`LOCAL_NETWORK_UNSUPPORTED_MESSAGE` analogue) — no new mechanism, no silent unenforced boundary.
3. **Template — BAKED, like Daytona** (do what we do now). Build an E2B template carrying
the daemon + pi (`E2BProviderOptions.template`), the E2B equivalent of `build_snapshot.py`.
Not runtime auto-install.
4. **Leak backstop — `timeoutMs` + `autoPause`** on the e2b provider. E2B auto-kills a sandbox at
its timeout; set a non-zero timeout so a process-KILL-leaked E2B sandbox self-reaps, the
functional equivalent of Daytona's `ephemeral + autoStopInterval`.

## PoC gotchas to carry into the template build

- **E2B template build**: `npx @e2b/cli template create <name> -d e2b.Dockerfile` (v2; `build` is
wrong for the installed CLI). `install-agent` HANGS in E2B's remote builder at the ACP-adapter
step — replicate manually with `npm install @agentclientprotocol/<x>-acp` + native binary curl.
ENV vars do NOT persist across RUN layers in the E2B builder — hardcode paths. `printf '\n'`
mangled — write launcher scripts via `base64 -d`. Agents under `/root/.local` (USER root);
run the server as root; cwd `/root/work`.
- **pi needs node ≥ 22.19**: E2B base image ships node 20 → pi-acp crashes at runtime
(`AcpRpcError: Cannot call write after a stream was destroyed`). Install node 22 (nodesource)
in the E2B template. Pi-only symptom — but this worktree IS Pi-on-E2B, so it's load-bearing here.

## Files (verified)

- `services/agent/src/engines/sandbox_agent/provider.ts` — `buildSandboxProvider` (add e2b branch) + a `buildE2bCreate` sibling to `buildDaytonaCreate`
- `services/agent/src/engines/sandbox_agent/run-plan.ts` — `sandboxId`/`isDaytona` (add `isE2b` + `defaultE2bCwd`); network/asset gates
- `services/agent/src/engines/sandbox_agent/daytona.ts` — reference for the E2B asset-prep sibling (new `e2b.ts`); `pi-assets.ts` uploaders are reusable
- `services/agent/src/engines/sandbox_agent.ts` — the prepare-assets dispatch (`if (plan.isDaytona) prepareDaytonaPiAssets`) gains an e2b arm
- `api/oss/src/utils/env.py` — add `E2bConfig` (`E2B_API_KEY`, ...) alongside `DaytonaConfig`
- `sdks/python/agenta/sdk/agents/dtos.py` — sandbox stays a loose string; no enum change
- `sandbox-images/e2b/` — new E2B template recipe (follow-up if baking)
- Tests: provider create-object unit (mirror the Daytona create test); local-vs-e2b gate tests
57 changes: 57 additions & 0 deletions docs/design/agent-workflows/projects/add-sandbox-e2b/specs.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
# Add the E2B sandbox (running Pi) — specs

## Scope

In: `sandbox="e2b"` runs the **Pi** harness, using the existing `sandbox-agent/e2b` provider;
a **baked E2B template** (daemon + pi, node 22) like the Daytona snapshot; Pi's existing remote
asset-prep retargeted to the E2B handle; `E2B_API_KEY` config; a `timeoutMs`-based leak
backstop. Out: any non-Pi harness on E2B (deferred — needs the non-Pi remote bootstrap),
restricted-network enforcement on E2B (refused under strict instead).

## Behavior

- A run with `sandbox="e2b"` (or `SANDBOX_AGENT_PROVIDER=e2b`) starts the baked-template E2B
sandbox (daemon + pi already present), runs Pi there, streams the result, and **always tears the
sandbox down** on every normal/error/disconnect path (the `finally`), with a `timeoutMs`-based
self-reap backstop for the process-KILL case (E2B auto-kills at its timeout — the functional
equivalent of Daytona's `ephemeral + autoStop`).
- Pi authenticates with the resolved provider key (managed `env`) or its uploaded own login
(`runtime_provided`), exactly as on Daytona — the `shouldUploadOwnLogin` decision is reused.
- The ACP connection uses the plain `createAcpFetch` (E2B has no preview-proxy cookie).
- Pi's extension, forced skills, system prompts, and usage file are provisioned into the E2B
sandbox; tools run via the file relay (already remote-capable); tracing is Pi-self-instrumented
under the propagated traceparent.
- A restricted `network` policy on E2B is **refused loud under `strict`** (the `sandbox-agent/e2b`
wrapper exposes no egress control; no silent unenforced boundary), mirroring the local gate.

## Contracts

- Wire unchanged: `sandbox` is a free string; no golden change required for selection. (Add an
e2b example fixture only if a new wire field is introduced — none expected.)
- `E2bConfig` in `env.py` exposes `E2B_API_KEY` + the template name var via the shared `env`
object (never `os.getenv` directly in app code).

## Decisions — LOCKED (see research.md for evidence)

1. **Auth/connection: `E2B_API_KEY` + plain `createAcpFetch`** (no cookie jar).
2. **Restricted network: refuse under strict** (no E2B egress control exists; don't invent one).
3. **Template: BAKED** (daemon + pi + node 22), like the Daytona snapshot — not auto-install.
4. **Leak backstop: `timeoutMs` + `autoPause`** on the e2b provider (E2B auto-kills at timeout).

## Non-goals / invariants preserved

- No harness code changes; Pi is held constant. The only Python change is `E2bConfig`. (The
baked-template build is a new artifact under `sandbox-images/e2b/`, not app code.)
- The provider stays a thin branch in `buildSandboxProvider`; no provider class hierarchy.
- Teardown + leak backstop parity with Daytona is mandatory — an E2B sandbox must never outlive
its run (cost/security).
- Restricted boundaries are enforced or refused, never silently accepted.

## Acceptance

- Unit: `buildE2bCreate` produces the expected provider options (env, `template`, `timeoutMs`/
`autoPause` leak backstop) — mirror the Daytona create-object test; `run-plan` sets `isE2b` and
an E2B cwd; a restricted-network E2B run under strict is REFUSED with a clear message.
- Integration: a Pi-on-E2B run returns `ok:true` with output + a trace; the sandbox is gone
after the run (verify via the E2B API); a tool run delivers via the relay.
- Ungated endpoint → both editions per test-account convention.
49 changes: 49 additions & 0 deletions docs/design/agent-workflows/projects/add-sandbox-e2b/tasks.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
# Add the E2B sandbox (running Pi) — tasks

Decisions are LOCKED (see research.md / specs.md); Phase 0 is a quick re-verify.

## Phase 0 — re-verify E2B + provider reality (no app code)
> Reference first: the `vibes/sessions/demo` PoC ran Pi-on-E2B green (template
> `agenta-sandbox-agent`) and documented the template-build + node-22 gotchas. Confirm, reuse.
- [ ] T0.1 Re-confirm `sandbox-agent/e2b` options: `create`, `connect`, `template`, `agentPort`,
`timeoutMs`, `autoPause` (already read from types — sanity-check at impl).
- [ ] T0.2 Run Pi in a baked E2B template sandbox; confirm `SandboxAgent.connect({baseUrl})` with
plain `createAcpFetch` (no cookie) and that node ≥ 22.19 is present (pi requirement).

## Phase 1 — baked E2B template (the build artifact)
- [ ] T1.1 `sandbox-images/e2b/` recipe + `e2b.Dockerfile` baking the rivet daemon + pi + node 22,
mirroring Daytona's `build_snapshot.py`. Apply the PoC gotchas: `npx @e2b/cli template
create <name> -d e2b.Dockerfile`; manual `npm install @agentclientprotocol/<x>-acp` + native
binary curl (install-agent hangs in the builder); hardcode paths (env doesn't persist across
RUN); base64-write launcher scripts; USER root, cwd `/root/work`. Template name → env (T3.1).

## Phase 2 — Node provider
- [ ] T2.1 `provider.ts`: add `buildE2bCreate` (env via the `daytonaEnvVars` equivalent,
`template` name, `timeoutMs`/`autoPause` leak backstop) + an `e2b({...})` branch in
`buildSandboxProvider`.
- [ ] T2.2 `run-plan.ts`: add `isE2b`, `defaultE2bCwd()`, and **refuse** restricted-network E2B
under strict (mirror the `LOCAL_NETWORK_UNSUPPORTED_MESSAGE` gate — no E2B egress control).

## Phase 3 — Pi asset-prep + wiring on E2B
- [ ] T3.1 `api/oss/src/utils/env.py`: add `E2bConfig` (`E2B_API_KEY`, template name) on the
shared `env` object; wire into `EnvironSettings`.
- [ ] T3.2 New `engines/sandbox_agent/e2b.ts` mirroring the Pi parts of `daytona.ts`
(`prepareE2bPiAssets`): reuse `pi-assets.ts` uploaders against the E2B handle. pi is baked
in the template, so no in-sandbox install needed.
- [ ] T3.3 `sandbox_agent.ts`: extend the prepare dispatch (`if (plan.isDaytona) ...`) with an
e2b arm; use plain `createAcpFetch` (no cookie); verify `inFlightSandboxes` + `finally`
teardown cover the E2B handle (they operate on the generic handle — confirm).

## Phase 4 — tests & docs
- [ ] T4.1 Unit: `buildE2bCreate` options (template, `timeoutMs`/`autoPause`, env — mirror the
Daytona create-object test); `run-plan` `isE2b` + cwd; restricted-network E2B under strict
is REFUSED with a clear message.
- [ ] T4.2 Integration: Pi-on-E2B returns output + trace; sandbox deleted after run (E2B API
check); relay tool run works. Both editions (ungated convention).
- [ ] T4.3 `documentation/` sandbox doc + comparison table updated with E2B (baked template,
refuse-restricted-network, `timeoutMs` backstop; deferred: non-Pi harnesses on E2B).

## Verify before merge
- [ ] `ruff format`/`ruff check`; `pnpm test`/`pnpm run typecheck`.
- [ ] Diff scoped vs origin/main; drop findings also on main.
- [ ] Teardown/leak parity with Daytona explicitly verified — no E2B sandbox outlives its run.
1 change: 1 addition & 0 deletions services/runner/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@
},
"dependencies": {
"@daytonaio/sdk": "^0.187.0",
"@e2b/code-interpreter": "^1.0.0",
"@earendil-works/pi-coding-agent": "0.79.4",
"@opentelemetry/api": "1.9.0",
"@opentelemetry/exporter-trace-otlp-proto": "0.54.0",
Expand Down
Loading
Loading