[fix] Refuse tools on non-Pi harness x remote sandbox instead of silently dropping them#5047
Conversation
…t test The heartbeat contract dropped the status field (the API derives ended server-side when the heartbeat stops); the test still asserted it. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…ent drop) A non-Pi harness (Claude) on a remote (Daytona) sandbox had no working delivery path for gateway/custom tools: the internal tool-MCP channel is runner-loopback-only (unreachable from inside the sandbox), and the file-relay fallback's sandbox-side writer exists only in Pi's bundled extension. The run proceeded anyway, mcp.ts logged a false "delivered via the file relay", and the harness silently got zero tools with ok:true. buildRunPlan now refuses this combination up front (before any sandbox is created), mirroring the existing not-implemented gates (code tools, stdio MCP, Pi user-MCP). The mcp.ts log is now Pi-only, as defense-in-depth against a future gate bypass. Design docs for the real fix (an in-sandbox relay client vs a sandbox-reachable tool-MCP URL) are tracked under docs/design/agent-workflows/projects/remote-tools-delivery/. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Plus Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Pull request overview
This PR prevents a previously silent failure mode where non-Pi harnesses running in a remote Daytona sandbox would receive zero executable tools (due to the runner’s internal tool MCP being loopback-only), by failing the run plan early with an explicit error and by hardening related logging/tests.
Changes:
- Add an up-front run-plan gate that rejects
non-Pi × daytona × executable toolswith a clear, actionable error message (client-only tools are exempt). - Update MCP-layering behavior/tests to ensure Daytona + non-Pi never emits the (previously false) “delivered via the file relay” claim.
- Adjust unit tests to align with the updated heartbeat payload contract and add design docs for the longer-term “remote tools delivery” fix.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| services/runner/src/engines/sandbox_agent/run-plan.ts | Adds a new early-fail gate (REMOTE_TOOLS_UNSUPPORTED_MESSAGE) for non-Pi + Daytona runs requesting executable tools. |
| services/runner/src/engines/sandbox_agent/mcp.ts | Updates Daytona/internal-tool-channel commentary and restricts the “file relay delivered” log claim to avoid false assertions. |
| services/runner/tests/unit/sandbox-agent-run-plan.test.ts | Adds coverage for the new remote-tools gate and adjusts other Daytona/tool tests to avoid being masked by the new gate. |
| services/runner/tests/unit/session-mcp-layering.test.ts | Adds a regression guard ensuring non-Pi + Daytona never logs “delivered via the file relay”. |
| services/runner/tests/unit/session-alive.test.ts | Updates expectations to reflect removal of status from the heartbeat body (status derived server-side). |
| docs/design/agent-workflows/projects/remote-tools-delivery/research.md | Documents the root cause (F1) and the interim gating fix. |
| docs/design/agent-workflows/projects/remote-tools-delivery/specs.md | Proposes solution candidates for enabling remote tool delivery for non-Pi harnesses. |
| docs/design/agent-workflows/projects/remote-tools-delivery/tasks.md | Tracks an implementation task queue for the recommended longer-term fix. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // delivery. Kept as a defense-in-depth check — a future change to the early-return above must | ||
| // not resurrect the F1 false log ("delivered via the file relay" for a harness with no relay | ||
| // writer). `run-plan.ts` refuses this combination before it ever reaches here. | ||
| if (isDaytona && toolSpecs.length > 0 && isPi) { |
There was a problem hiding this comment.
Fixed in 0e24206: the dead && isPi condition is gone; the log now states only that the loopback advertisement was skipped, without claiming any delivery path.
mmabrouk
left a comment
There was a problem hiding this comment.
Reviewed (Claude, on Mahmoud's behalf). Verdict: correct and well-scoped. The gate mirrors the existing not-implemented gates, fires before any cwd/sandbox is created, the false "delivered via the file relay" log is fixed with a defense-in-depth condition, and the test coverage (gate fires, Pi exempt, local exempt, client-only exempt, no-tools exempt) is exactly right. CI is green. This is strictly better than the silent zero-tools ok:true behavior and I'd ship it.
Four decision points are flagged inline for Mahmoud to rule on before merge. Short map of the overlapping work, since three PRs touch the same gap:
- #4985 (client tools to Claude, pre-rename, needs the
services/agent->services/runnerrebase) contains its own copy of this gate (DAYTONA_TOOL_DELIVERY_UNSUPPORTED_MESSAGE) and changes what the client-tool exemption here means. - #4873 (in-sandbox stdio MCP relay shim, also pre-rename) already implements the "real fix" this PR's
remote-tools-deliverydocs recommend as candidate (b). - #5045/#5053 (E2B stack) add a
sandbox="e2b"provider that this gate'sisDaytonapredicate does not cover.
None of these block the gate itself; they decide its exact predicate and who owns the follow-ups.
| // every tool, and still returned `ok:true`. Refuse up front, the way the other not-implemented | ||
| // gates above do. `executableToolSpecsForRun` excludes `client` tools (browser-fulfilled, never | ||
| // routed through this channel), matching what the internal MCP channel actually advertises. | ||
| if (!isPi && isDaytona && executableToolSpecsForRun.length > 0) { |
There was a problem hiding this comment.
Decision 1 - predicate: isDaytona or "any non-local sandbox"?
The E2B stack (#5045, and #5053 for Claude-on-E2B) adds sandbox="e2b", and nothing in those diffs extends this gate. As written, claude x e2b x tools will bypass the gate and silently drop tools again - the exact F1 bug, one provider over.
Recommendation: fail closed for any remote provider, e.g. key this on sandbox !== "local" (or an isRemote derived flag), so a new provider starts refused-with-a-clear-error until someone proves tool delivery there. Alternative: keep isDaytona and make extending the gate an explicit task in the E2B lane. The first option is safer; unknown providers should not fail open.
There was a problem hiding this comment.
agree with recommendation. please update the pr accordingly.
There was a problem hiding this comment.
Done in 0e24206: gate now keys on isRemoteSandbox = sandboxId !== "local", message reworded to "remote sandbox", and a fail-closed unit test for an unknown provider (sandbox: "e2b") added.
| // (unreachable from inside the sandbox), and the file-relay fallback has a sandbox-side writer | ||
| // only inside Pi's bundled extension. Before this gate the run proceeded, silently dropped | ||
| // every tool, and still returned `ok:true`. Refuse up front, the way the other not-implemented | ||
| // gates above do. `executableToolSpecsForRun` excludes `client` tools (browser-fulfilled, never |
There was a problem hiding this comment.
Decision 2 - the client-tool exemption interacts with #4985.
Two things worth deciding consciously:
- Even today,
claude x daytona x client-onlyis not actually a working combination: client tools are filtered out of the MCP advertisement for Claude everywhere (that is the bug feat(agent): deliver client tools to Claude over the internal MCP channel #4985 fixes), so the model never sees the tool and the run silently delivers nothing. The exemption is transport-correct (client tools never ride this channel) but outcome-wise it lets a silent no-delivery combo pass the gate. - After feat(agent): deliver client tools to Claude over the internal MCP channel #4985 lands, client tools DO ride the internal
agenta-toolsMCP channel for Claude - which is unreachable on Daytona. At that point this exemption reopens the silent drop for client tools specifically. (feat(agent): deliver client tools to Claude over the internal MCP channel #4985's own copy of the gate does not exempt them, so the two gates disagree on scope.)
Options: (a) drop the exemption now (strictest, matches #4985's version), or (b) keep it and make tightening the gate an explicit item in #4985's rebase. Either is fine; it just should not be resolved by accident during rebase conflict resolution.
There was a problem hiding this comment.
let's keep the logic for the client tools fix in 4985
There was a problem hiding this comment.
Kept the exemption here per Mahmoud's ruling; the gate comment now names #4985 as the owner of tightening it when client tools move onto the MCP channel.
| * zero tools with `ok:true` (the silent-tool-drop bug). Refuse loud instead, mirroring the | ||
| * other not-implemented gates above. | ||
| */ | ||
| export const REMOTE_TOOLS_UNSUPPORTED_MESSAGE = |
There was a problem hiding this comment.
Decision 3 - duplicate gate in #4985: pick one owner.
#4985 (pre-rename, awaiting the services/agent -> services/runner rebase) adds the same up-front refusal as DAYTONA_TOOL_DELIVERY_UNSUPPORTED_MESSAGE. If both merge as-is we get two constants, two messages, and two slightly different scopes (see the client-tool comment) for one rule.
Recommendation: treat THIS gate as canonical (it is green, post-rename, and better documented via the remote-tools-delivery project), and strip the gate portion out of #4985 during its rebase, keeping #4985 focused on client-tool delivery + the seam consolidation.
There was a problem hiding this comment.
agree. please remember to strip this during the rebase
There was a problem hiding this comment.
Agreed; noted as an explicit step in the #4985 rebase plan (strip DAYTONA_TOOL_DELIVERY_UNSUPPORTED_MESSAGE, this gate is canonical).
| | **Which cells does it unblock** | All remote-sandbox non-Pi combinations at once, retroactively, for any current or future provider that can offer the reachable-URL primitive — but gated on that primitive existing per-provider | All remote-sandbox non-Pi combinations at once IF the daemon-spawned sub-option lands (provider-agnostic); narrower if stuck relying on per-harness extension support | | ||
| | **Security review surface** | New: a bearer-secured internet-reachable (or provider-proxied) endpoint serving a JSON-RPC tool-call API — needs its own threat model, closer in shape to the SSRF-guarded user-HTTP-MCP path (`mcp.ts` `validateUserMcpUrl`) than to today's "carries no secret because unreachable" internal channel | Reuses the EXACT trust boundary already accepted for Pi + Daytona (file relay inside a daemon-managed sandbox filesystem) — no new category of risk, just a second writer of an existing protocol | | ||
|
|
||
| ## Recommendation |
There was a problem hiding this comment.
FYI for task 0 in tasks.md: candidate (b) already has a working implementation in #4873 ("in-sandbox stdio MCP relay shim", F-042) - relay-mcp-stdio.ts + relay-shim.ts + relay-client.ts, i.e. an in-sandbox stdio MCP server that writes the file relay, spawned without daemon changes. It predates the services/agent -> services/runner rename so it needs a path migration, but the load-bearing question ("can an in-sandbox stdio MCP server be attached alongside a non-Pi harness?") appears already answered there. Recommend reviving/rebasing #4873 as the implementation of this spec rather than re-spiking, and cross-linking it from this doc.
| }); | ||
|
|
||
| it("release() sends a final heartbeat with is_running=false and status=ended", async () => { | ||
| it("release() sends a final heartbeat with is_running=false (status derived server-side)", async () => { |
There was a problem hiding this comment.
Minor: this heartbeat-contract test fix is unrelated to the remote-tools gate (separate commit, fixing a stale assertion after status was dropped from the heartbeat). Fine to ride along; flagging only so the PR description's scope is understood.
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
services/runner/src/engines/sandbox_agent/mcp.ts (1)
161-170: 📐 Maintainability & Code Quality | 🟠 Major | ⚡ Quick winStale
isDaytonafield doc contradicts the fix a few lines below.This docstring still unconditionally asserts "gateway tools are delivered through the file relay instead (the relay loop already polls the sandbox filesystem on Daytona...)" for the Daytona case. That's precisely the false claim
research.mdcites as part of the F1 root cause: "The code's own comment claimed the gap was covered ... This is false for a non-Pi harness." The corrected reasoning was added to the function-level JSDoc at Line 193-201 in this same file, but this field-level doc wasn't updated to match, leaving two contradictory explanations of the same field in one file.📝 Proposed doc fix
/** * True when the run executes in a REMOTE Daytona sandbox (the harness runs IN the sandbox, * not on the runner host). Gates the internal gateway-tool channel: the channel's loopback * (`127.0.0.1`) HTTP MCP URL resolves to the SANDBOX's loopback there, not the runner's, so - * advertising it would hand the in-sandbox harness an unreachable URL. On Daytona the channel - * is skipped and gateway tools are delivered through the file relay instead (the relay loop - * already polls the sandbox filesystem on Daytona — see `engines/sandbox_agent.ts`). See the - * Daytona guard in `buildSessionMcpServers`. + * advertising it would hand the in-sandbox harness an unreachable URL. On Daytona the channel + * is skipped; only Pi has a working fallback (its bundled extension writes the file relay). + * For any other harness this means NO delivery path exists today (F1) — `run-plan.ts` refuses + * that combination up front. See the Daytona guard in `buildSessionMcpServers`. */ isDaytona: boolean;
🧹 Nitpick comments (2)
services/runner/tests/unit/sandbox-agent-run-plan.test.ts (1)
420-446: 🎯 Functional Correctness | 🔵 TrivialOptional: pin gate-precedence for claude + restricted-network + daytona + tools.
No test currently asserts that the F1 gate's
REMOTE_TOOLS_UNSUPPORTED_MESSAGE(not the Layer-2 network-boundary message) is what aclaude+ Daytona + restricted-network + tools run returns. Both gates would fire on that input; a future reordering could silently swap which error a non-Pi caller sees without any test catching it.services/runner/src/engines/sandbox_agent/run-plan.ts (1)
214-214: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winCentralize the executable-tool filter
executableToolSpecs()and the MCP delivery path should share one helper for(kind ?? "callback") !== "client"so a new tool kind can’t drift between the F1 gate and what the channel delivers.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: b3f04ffc-e4c2-4d43-8f20-943073a19f13
📒 Files selected for processing (8)
docs/design/agent-workflows/projects/remote-tools-delivery/research.mddocs/design/agent-workflows/projects/remote-tools-delivery/specs.mddocs/design/agent-workflows/projects/remote-tools-delivery/tasks.mdservices/runner/src/engines/sandbox_agent/mcp.tsservices/runner/src/engines/sandbox_agent/run-plan.tsservices/runner/tests/unit/sandbox-agent-run-plan.test.tsservices/runner/tests/unit/session-alive.test.tsservices/runner/tests/unit/session-mcp-layering.test.ts
…honest skip log Address review on #5047: the F1 gate now keys on sandbox !== "local" (isRemoteSandbox) so an unknown or future remote provider (E2B et al.) fails closed with the same loud error instead of silently re-opening the zero-tools drop; reword the message to say "remote sandbox". Replace the dead isPi-guarded file-relay log in mcp.ts (unreachable past the isPi early-return, per Copilot) with an honest "skipped the loopback advertisement" log that never claims a delivery that is not happening. Add a fail-closed unit test for an unknown provider (sandbox=e2b). Client-tool exemption stays; #4985 owns tightening it when client tools move onto the MCP channel. Claude-Session: https://claude.ai/code/session_01HhBEUFbXETNjYdcz71AGrT
…annel Claude never saw client tools (request_connection): the internal agenta-tools channel filtered them out of tools/list, so the model could not call them and the browser widget never rendered — a silent drop, Pi-only feature. Now: - mcp-bridge/tool-mcp-http advertise client tools when a clientToolRelay is wired (local Claude) and PAUSE a client tools/call through the shared seam: the handler returns the MCP_PARKED sentinel, the listener destroys the socket with no JSON-RPC body (a result would let Claude settle the call and clobber the pending widget), and the turn ends paused. Required args are validated first so an under-specified call is a normal MCP tool error (model retries). - engine: deferred clientToolRelay ref (the MCP server is built before the responder exists), ACP tool-call correlation index recorded from the event stream (the paused widget attaches to Claude's real tool bubble and its late frames are suppressed), and an mcpAbort AbortController fired from the pause controller's destroy path + the finally so no in-flight tools/call settles after the turn ends. - run-plan (#5047 gate tightened, as its comment assigned to #4985): a non-Pi remote-sandbox run now refuses on ANY custom tool (toolSpecs), client kind included — client tools ride the MCP channel now, so on a remote sandbox they are exactly as undeliverable as gateway tools. Flipped the claude x daytona x client-only test to assert refusal; research.md §4 updated to match. Claude-Session: https://claude.ai/code/session_01HhBEUFbXETNjYdcz71AGrT
Context
Any custom tool, including gateway/callback tools or a client tool like
request_connection, silently vanished when run with a non-Pi harness on a remote sandbox: the model would receive zero tools with no error surfaced anywhere. The internal MCP server the runner exposes for tool calls listens on the runner's own loopback (127.0.0.1) intool-mcp-http.ts, which a remote sandbox (Daytona or E2B) can't reach, somcp.tssilently skipped attaching it. Pi didn't hit this because it carries its own in-sandbox extension that relays tool calls back; Claude, Codex, and OpenCode have no such relay yet.Changes
run-plan.tsnow fails the run plan up front — before the sandbox is even created — when the harness is not Pi, the sandbox is remote, and the run actually requested executable tools:if (!isPi && isDaytona && executableToolSpecsForRun.length > 0) return { ok: false, error: REMOTE_TOOLS_UNSUPPORTED_MESSAGE }. Client-kind tools (therequest_connectionstyle, handled entirely client-side) are exempted since they never touch the runner's MCP server. This is the interim fix; the real fix is a relay-client shim so non-Pi harnesses can reach the runner's tool server from a remote sandbox, tracked indocs/design/agent-workflows/projects/remote-tools-delivery/.Tests / notes
sandbox-agent-run-plan.test.tsandsession-mcp-layering.test.tscovering the gate firing, the client-tool exemption, and Pi being unaffected.