Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
105 changes: 105 additions & 0 deletions docs/design/agent-workflows/projects/remote-tools-delivery/research.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,105 @@
# Research

Verified against the working tree. Paths are repo-relative under `services/runner/`.

## 1. The bug (F1, audit finding)

A non-Pi harness (Claude today; any future MCP-only harness) running on a **remote** sandbox
(Daytona) has no working delivery path for gateway/custom tools at all:

- `engines/sandbox_agent/mcp.ts` `buildSessionMcpServers` builds the internal gateway-tool
channel as a runner-loopback (`127.0.0.1`) HTTP MCP server
(`tools/tool-mcp-http.ts` `startInternalToolMcpServer`). On Daytona the harness runs **inside**
the sandbox, where `127.0.0.1` resolves to the sandbox's own loopback, not the runner's — an
unreachable URL. `buildSessionMcpServers` correctly skips advertising it there
(`isDaytona ? { servers: [], close... } : await buildToolMcpServers(...)`).
- The code's own comment claimed the gap was covered: "gateway tools are delivered through the
file relay instead (the relay loop already polls the sandbox filesystem on Daytona)." This is
false for a non-Pi harness. The file-relay protocol (`tools/relay.ts`) has two halves: a
**runner-side poller** (works for any sandbox, harness-agnostic) and a **sandbox-side writer**
that must ask the runner to run a tool by dropping a `<id>.req.json` file into the relay dir.
That writer exists in exactly one place: `extensions/agenta.ts` `registerTools`, which is Pi's
bundled extension (`pi.registerTool` + `runResolvedTool` in the `execute` callback). It is
installed only into Pi's agent directory and loaded only when Pi runs. Claude (or any other
non-Pi harness) has no code path that writes a relay request file.
- The capability gate that should have caught this doesn't, because it is checking the wrong
thing: `engines/sandbox_agent/capabilities.ts` `assertRequiredCapabilities` only asserts that
the harness *can accept an MCP server and call a tool* (`capabilities.mcpTools` +
`capabilities.toolCalls`, from the daemon probe). Claude reports both `true` — correctly, in
general, since Claude can and does consume the internal channel over loopback HTTP locally.
The gate has no notion of "reachable from here"; it never looks at `isDaytona`.
- Net effect before this fix: the run proceeds, the harness receives an empty tool list, the
turn completes, `mcp.ts` logs a specific but FALSE message
(`"daytona: N gateway tool(s) delivered via the file relay, not a loopback MCP URL"`), and the
`/run` result comes back `ok:true`. A caller has no signal that every tool call the model might
have made was structurally impossible.

## 2. Why Pi is unaffected

Pi never uses either MCP path for tools. `buildSessionMcpServers` short-circuits to
`{ servers: [], close: async () => {} }` for `isPi` before either layer runs
(`if (isPi || !capabilities.mcpTools) { ... return ... }`). Pi's tools ride entirely through
`extensions/agenta.ts`: the daemon loads the extension into the Pi process (local or inside the
Daytona sandbox — `prepareDaytonaPiAssets` uploads it there), the extension reads
`AGENTA_AGENT_TOOLS_PUBLIC_SPECS` / `AGENTA_AGENT_TOOLS_RELAY_DIR` from its environment, and its
`execute` callback calls `runResolvedTool`, which is the SAME sandbox-side relay-writer used by
the file-relay protocol (`tools/dispatch.ts`). So "the file relay works on Daytona" is true only
because Pi ships its own relay-writing client bundled as a harness extension. No other harness
has an equivalent.

## 3. What already exists to build on

- **Loopback HTTP MCP server** (`tools/tool-mcp-http.ts`): a from-scratch, dependency-free
Streamable-HTTP JSON-RPC server (`initialize` / `tools/list` / `tools/call`), started per
session on an OS-assigned port, bound to `127.0.0.1` only. This is the piece that needs a
reachable address on Daytona, or an in-sandbox counterpart — see the two candidates below.
- **File-relay protocol** (`tools/relay.ts`): request/response JSON files in a shared dir,
polled by the runner (`startToolRelay`), written by whichever process wants a tool run. Already
harness-agnostic on the runner side. Already reaches into the Daytona sandbox filesystem via
the daemon API (see `engines/sandbox_agent.ts`, the "Daytona tool relay" section) — the runner
polls files that live inside the sandbox, not on the runner host.
- **Pi's bundled extension pattern** (`extensions/agenta.ts`): a fully-worked example of a
sandbox-side relay-writing client — esbuild-bundled, env-driven, self-contained. It is a
concrete template for a hypothetical non-Pi equivalent, though it depends on the harness having
an extension/plugin mechanism to load into (Pi does; Claude's ACP agent and Codex's may not).
- **Daytona's own remote-reachability precedent — the preview proxy**: the runner already talks
to a Daytona-hosted ACP HTTP endpoint through Daytona's preview-proxy mechanism
(`engines/sandbox_agent/daytona.ts` `createCookieFetch`: "Daytona's preview proxy
authenticates with a … cookie jar"). This means Daytona already exposes an authenticated
reverse proxy from outside the sandbox to a port inside it — the reverse of what the internal
tool-MCP channel would need (inside the sandbox reaching a URL that resolves to the runner),
but it establishes that Daytona has a general request-routing/proxy primitive worth
investigating for the other direction, or for exposing a runner-side port that a
Daytona-issued URL can forward into the sandbox's network namespace.
- **The ngrok tunnel used for durable-cwd mounting** (`engines/sandbox_agent/mount.ts`
`discoverTunnelEndpoint`): a working precedent for making a runner-side service reachable from
inside a Daytona sandbox over the public internet (geesefs inside the sandbox mounts the
runner's storage over this tunnel). This is the closest existing analogue to "advertise the
internal tool-MCP on a sandbox-reachable URL" (candidate (a) below) — the same tunnel
infrastructure could plausibly carry the tool-MCP's HTTP traffic too, provided auth is added
(the mount path currently authenticates via signed mount credentials, not a bearer suitable for
MCP).
- **No E2B path on this branch.** `mount.ts` mentions "e2b" only in a comment; there is no E2B
sandbox provider implemented (`engines/sandbox_agent/provider.ts` only builds `local` and
`daytona`). The interim gate below is written against the two providers this codebase actually
has: `local` (loopback reachable) and `daytona` (loopback unreachable).

## 4. The interim fix implemented alongside these docs

`engines/sandbox_agent/run-plan.ts` `buildRunPlan` now refuses, before any cwd or sandbox is
created, any run where `!isPi && isDaytona && executableToolSpecsForRun.length > 0` —
`REMOTE_TOOLS_UNSUPPORTED_MESSAGE`. This mirrors the existing not-implemented gates in the same
file (`CODE_TOOL_UNSUPPORTED_MESSAGE`, `USER_MCP_UNSUPPORTED_MESSAGE`,
`PI_USER_MCP_UNSUPPORTED_MESSAGE`, `FILESYSTEM_UNSUPPORTED_MESSAGE`,
`LOCAL_NETWORK_UNSUPPORTED_MESSAGE`): fail loud with a single named message instead of silently
dropping a declared capability. `executableToolSpecsForRun` (already computed earlier in the
function via `executableToolSpecs(toolSpecs)`) excludes `client`-kind tools, which are
browser-fulfilled and were never advertised over the internal channel in the first place
(`tool-mcp-http.ts` `tools/list` filters them out too), so a client-only tool run is unaffected.
The `mcp.ts` "delivered via the file relay" log is now conditioned on `isPi` so it can never again
claim a delivery that isn't happening, as defense-in-depth against a future gate bypass (the
run-plan gate should make the branch it guards dead code, but the log no longer trusts that).

This is explicitly the interim/shipping fix, not the real feature — it trades "some valid
combinations now error where they used to (silently) proceed" for "no combination silently drops
tools." The real feature (making the combination actually work) is scoped below.
117 changes: 117 additions & 0 deletions docs/design/agent-workflows/projects/remote-tools-delivery/specs.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,117 @@
# Specs

## Goal

Make gateway/custom tool delivery work for a non-Pi harness (Claude today; any future MCP-only
harness) on a remote sandbox (Daytona today; any future non-local provider), closing the gap the
interim gate in `run-plan.ts` currently refuses outright. Non-goals: user-declared stdio MCP
(stays disabled, unrelated security decision) and user-declared HTTP MCP (already works
unchanged on every sandbox, since it is a remote URL the harness dials directly).

## The gap, precisely

The internal gateway-tool channel has two jobs: **advertise** a way for the harness to discover
and call the tools, and **execute** a call by relaying it back to the runner, where the private
spec / scoped env / callback auth are applied server-side. Execution already works
sandbox-agnostically (`tools/relay.ts`, `tools/dispatch.ts`). Only advertisement is missing for
a non-Pi harness in a remote sandbox: there is no artifact inside the sandbox that speaks
whatever protocol the harness expects for "here are your tools."

Two shapes close that gap:

## Candidate (a): advertise the internal tool-MCP on a sandbox-reachable URL

Keep today's design (`tools/tool-mcp-http.ts`'s HTTP MCP server, `mcp.ts`'s advertisement of a
`type: "http"` ACP MCP server) but bind or expose it so a URL reachable **from inside the
sandbox** resolves back to the runner process, instead of binding to loopback only. Three
sub-options for the "reachable URL" part:

1. **Runner binds non-loopback.** Bind `tool-mcp-http.ts` to `0.0.0.0` (or a specific
routable interface) instead of `127.0.0.1`, and advertise the runner's own network address.
Only viable if the runner and the Daytona sandbox share a network path that doesn't already
exist today (they don't — the sandbox is a separate remote VM/container with its own network
namespace; nothing routes a Daytona sandbox's outbound traffic to the runner host's private
IP without additional infrastructure). Effectively requires one of the other two sub-options
underneath it anyway.
2. **A tunnel** (reusing `discoverTunnelEndpoint`'s ngrok infrastructure, already used to expose
the runner's storage mount to a Daytona sandbox). The internal MCP server binds loopback as
today; the runner also registers/reuses an ngrok tunnel forwarding a public HTTPS URL to that
local port; that URL is what gets advertised as the ACP MCP server's `url`.
3. **A sandbox-side port-forward issued by the provider** (Daytona's own preview-proxy
mechanism, the same one that already fronts the ACP HTTP endpoint with cookie-jar auth in
`daytona.ts` `createCookieFetch`). If Daytona (or a future provider) can forward a port
**into** the sandbox's own loopback from a runner-controlled listener, the sandbox's existing
`127.0.0.1` advertisement becomes correct again with zero harness-side change — the forwarded
port simply makes the runner's server locally reachable inside the sandbox.

**Auth story required in all three:** today's channel deliberately carries no secret because it
never leaves the runner's own loopback (no attacker can reach `127.0.0.1` on the runner host from
outside it). The moment the URL is reachable from the sandbox — let alone from the tunnel's
public internet hop in sub-option 2 — the channel needs its own bearer/token so a compromised or
malicious in-sandbox process (or, worse, anyone who guesses the tunnel URL) cannot call
`tools/call` and trigger a credentialed gateway action. This is new work: a per-run random token,
checked on every request, threaded through the ACP `headers` field (the `McpServerHttp` shape
already supports headers — `toAcpMcpServers` does exactly this for user HTTP MCP servers).

## Candidate (b): an in-sandbox relay CLIENT for non-Pi harnesses

Ship (or have the daemon spawn) a small MCP-server shim that runs **inside** the sandbox,
alongside the harness, and translates between "the harness's native tool-delivery mechanism" and
"the file-relay protocol" (write a `<id>.req.json`, poll for `<id>.res.json` — exactly what
`extensions/agenta.ts` `registerTools` already does for Pi, minus the Pi-extension plumbing
around it). Two delivery mechanisms for getting the shim into the sandbox:

1. **Daemon-spawned**, analogous to how the daemon already knows how to run each harness: the
daemon starts the shim as a sibling process/MCP server alongside the ACP agent process and
wires it into the session's MCP server list the way it wires in any other stdio MCP server —
except this stdio server runs **inside the sandbox**, not on the runner host, so it does not
reintroduce the runner-host-process security hole that `USER_MCP_UNSUPPORTED_MESSAGE` guards
against. This needs the daemon (`sandbox-agent` package, not this repo) to grow a concept of
"an extra in-sandbox MCP server the runner wants attached," which does not exist today for a
non-Pi harness.
2. **Shipped like Pi's extension**, if the target harness has an equivalent
plugin/extension-loading mechanism. Claude Code's ACP agent
(`@zed-industries/claude-agent-acp`) would need to expose something analogous to Pi's
`ExtensionAPI`/`registerTool` for this to apply; unconfirmed whether it does. If it doesn't,
this sub-option collapses into (1).

**Auth/exposure story**: strictly better than (a) — the shim's traffic never leaves the sandbox
(it talks to the relay dir on the sandbox filesystem, the same file-based IPC Pi already uses,
which the runner already reaches via the Daytona daemon filesystem API). No new network exposure,
no new secret to mint or rotate, no tunnel.

## Comparison

| Axis | (a) Reachable URL | (b) In-sandbox relay client |
| --- | --- | --- |
| **Auth / exposure** | New secret required; sub-option 2 exposes a URL over the public internet (mitigated by a bearer, but still a new attack surface); sub-option 3 depends on provider-level port-forward trust boundaries being sound | No new exposure — traffic stays inside the sandbox filesystem, same trust boundary the runner already crosses for Daytona mounts |
| **Latency** | Extra network hop (sandbox → tunnel/proxy → runner) per tool call, on top of the existing relay-execution hop back to the runner; for sub-option 2 specifically, adds public-internet round-trip latency | One hop shorter: the shim talks file-relay directly, same mechanism Pi already uses at today's measured latency (the relay-poll interval, `RELAY_POLL_MS`, dominates either way) |
| **Per-provider work** | Daytona: reuse `discoverTunnelEndpoint`/preview-proxy (some infra exists); a future E2B-style provider not on this branch would need its own equivalent (host-side port exposure, different proxy model) — the design is NOT provider-agnostic, each provider needs its own "how do I get a URL routed here" answer | Provider-agnostic once the daemon can spawn/attach a sandbox-side process at all (which every provider already supports, since it's how the harness itself runs) — the only per-provider work is none beyond what already exists to launch a process/write files inside that sandbox |
| **Per-harness work** | None once built — the ACP `McpServerHttp` advertisement is harness-agnostic (any harness that accepts `type: "http"` MCP, which Claude already does) | Needs the daemon to grow "attach an extra in-sandbox MCP server" (sub-option 1) OR needs each harness to expose an extension mechanism (sub-option 2, uncertain for non-Pi harnesses) — the harder, more novel piece of engineering |
| **Which cells does it unblock** | All remote-sandbox non-Pi combinations at once, retroactively, for any current or future provider that can offer the reachable-URL primitive — but gated on that primitive existing per-provider | All remote-sandbox non-Pi combinations at once IF the daemon-spawned sub-option lands (provider-agnostic); narrower if stuck relying on per-harness extension support |
| **Security review surface** | New: a bearer-secured internet-reachable (or provider-proxied) endpoint serving a JSON-RPC tool-call API — needs its own threat model, closer in shape to the SSRF-guarded user-HTTP-MCP path (`mcp.ts` `validateUserMcpUrl`) than to today's "carries no secret because unreachable" internal channel | Reuses the EXACT trust boundary already accepted for Pi + Daytona (file relay inside a daemon-managed sandbox filesystem) — no new category of risk, just a second writer of an existing protocol |

## Recommendation

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI for task 0 in tasks.md: candidate (b) already has a working implementation in #4873 ("in-sandbox stdio MCP relay shim", F-042) - relay-mcp-stdio.ts + relay-shim.ts + relay-client.ts, i.e. an in-sandbox stdio MCP server that writes the file relay, spawned without daemon changes. It predates the services/agent -> services/runner rename so it needs a path migration, but the load-bearing question ("can an in-sandbox stdio MCP server be attached alongside a non-Pi harness?") appears already answered there. Recommend reviving/rebasing #4873 as the implementation of this spec rather than re-spiking, and cross-linking it from this doc.


**Candidate (b), daemon-spawned sub-option**, is the better target: it reuses an
already-accepted trust boundary (the same one Pi's extension operates in today), needs no new
secret/token lifecycle, is provider-agnostic (works identically on Daytona and any future
provider, since it only depends on "can the daemon run an extra process/write files inside the
sandbox," which every provider must already support to run the harness at all), and does not add
a network hop or a new externally-reachable service to threat-model. Its cost is concentrated in
one place — teaching the daemon (the `sandbox-agent` package) to attach a non-Pi in-sandbox
relay-client MCP server to a session — rather than spread across every sandbox provider's
networking model the way (a) would require.

Candidate (a) is worth keeping as a fallback if the daemon cannot be taught to spawn an
in-sandbox process for a non-Pi harness (e.g., if the ACP agent's process model doesn't allow the
daemon to run a sibling process next to it) — in that case, reusing the existing Daytona tunnel
infrastructure (sub-option 2) is the least-new-code path, accepting the added secret-management
and public-exposure review it requires.

## Non-goals for this design (explicitly out of scope)

- Removing or relaxing the interim `run-plan.ts` gate before a working delivery path ships —
the gate stays until one of the above lands and is tested end-to-end.
- Any change to user-declared MCP (stdio stays disabled; HTTP stays as-is).
- Supporting a sandbox provider not already in this codebase (no E2B provider exists here today).
Loading
Loading