[fix] Refuse tools on non-Pi harness x remote sandbox instead of silently dropping them by junaway · Pull Request #5047 · Agenta-AI/agenta

junaway · 2026-07-02T21:41:36Z

Context

Any custom tool, including gateway/callback tools or a client tool like request_connection, silently vanished when run with a non-Pi harness on a remote sandbox: the model would receive zero tools with no error surfaced anywhere. The internal MCP server the runner exposes for tool calls listens on the runner's own loopback (127.0.0.1) in tool-mcp-http.ts, which a remote sandbox (Daytona or E2B) can't reach, so mcp.ts silently skipped attaching it. Pi didn't hit this because it carries its own in-sandbox extension that relays tool calls back; Claude, Codex, and OpenCode have no such relay yet.

Changes

run-plan.ts now fails the run plan up front — before the sandbox is even created — when the harness is not Pi, the sandbox is remote, and the run actually requested executable tools: if (!isPi && isDaytona && executableToolSpecsForRun.length > 0) return { ok: false, error: REMOTE_TOOLS_UNSUPPORTED_MESSAGE }. Client-kind tools (the request_connection style, handled entirely client-side) are exempted since they never touch the runner's MCP server. This is the interim fix; the real fix is a relay-client shim so non-Pi harnesses can reach the runner's tool server from a remote sandbox, tracked in docs/design/agent-workflows/projects/remote-tools-delivery/.

Tests / notes

New coverage in sandbox-agent-run-plan.test.ts and session-mcp-layering.test.ts covering the gate firing, the client-tool exemption, and Pi being unaffected.
Ships as a loud, early failure rather than the previous silent zero-tools behavior; no user-visible change for Pi or for any harness on the local sandbox.

…t test The heartbeat contract dropped the status field (the API derives ended server-side when the heartbeat stops); the test still asserted it. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

…ent drop) A non-Pi harness (Claude) on a remote (Daytona) sandbox had no working delivery path for gateway/custom tools: the internal tool-MCP channel is runner-loopback-only (unreachable from inside the sandbox), and the file-relay fallback's sandbox-side writer exists only in Pi's bundled extension. The run proceeded anyway, mcp.ts logged a false "delivered via the file relay", and the harness silently got zero tools with ok:true. buildRunPlan now refuses this combination up front (before any sandbox is created), mirroring the existing not-implemented gates (code tools, stdio MCP, Pi user-MCP). The mcp.ts log is now Pi-only, as defense-in-depth against a future gate bypass. Design docs for the real fix (an in-sandbox relay client vs a sandbox-reachable tool-MCP URL) are tracked under docs/design/agent-workflows/projects/remote-tools-delivery/. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

vercel · 2026-07-02T21:41:43Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
agenta-documentation	Ready	Preview, Comment	Jul 4, 2026 5:08pm

coderabbitai · 2026-07-02T21:41:47Z

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 641a79a2-5eb7-43fb-9b5f-972f6e220f33

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch chore/add-remote-tools-gate

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands.}

Copilot

Pull request overview

This PR prevents a previously silent failure mode where non-Pi harnesses running in a remote Daytona sandbox would receive zero executable tools (due to the runner’s internal tool MCP being loopback-only), by failing the run plan early with an explicit error and by hardening related logging/tests.

Changes:

Add an up-front run-plan gate that rejects non-Pi × daytona × executable tools with a clear, actionable error message (client-only tools are exempt).
Update MCP-layering behavior/tests to ensure Daytona + non-Pi never emits the (previously false) “delivered via the file relay” claim.
Adjust unit tests to align with the updated heartbeat payload contract and add design docs for the longer-term “remote tools delivery” fix.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
services/runner/src/engines/sandbox_agent/run-plan.ts	Adds a new early-fail gate (`REMOTE_TOOLS_UNSUPPORTED_MESSAGE`) for non-Pi + Daytona runs requesting executable tools.
services/runner/src/engines/sandbox_agent/mcp.ts	Updates Daytona/internal-tool-channel commentary and restricts the “file relay delivered” log claim to avoid false assertions.
services/runner/tests/unit/sandbox-agent-run-plan.test.ts	Adds coverage for the new remote-tools gate and adjusts other Daytona/tool tests to avoid being masked by the new gate.
services/runner/tests/unit/session-mcp-layering.test.ts	Adds a regression guard ensuring non-Pi + Daytona never logs “delivered via the file relay”.
services/runner/tests/unit/session-alive.test.ts	Updates expectations to reflect removal of `status` from the heartbeat body (status derived server-side).
docs/design/agent-workflows/projects/remote-tools-delivery/research.md	Documents the root cause (F1) and the interim gating fix.
docs/design/agent-workflows/projects/remote-tools-delivery/specs.md	Proposes solution candidates for enabling remote tool delivery for non-Pi harnesses.
docs/design/agent-workflows/projects/remote-tools-delivery/tasks.md	Tracks an implementation task queue for the recommended longer-term fix.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

mmabrouk · 2026-07-04T17:07:22Z

+  // delivery. Kept as a defense-in-depth check — a future change to the early-return above must
+  // not resurrect the F1 false log ("delivered via the file relay" for a harness with no relay
+  // writer). `run-plan.ts` refuses this combination before it ever reaches here.
+  if (isDaytona && toolSpecs.length > 0 && isPi) {


Fixed in 0e24206: the dead && isPi condition is gone; the log now states only that the loopback advertisement was skipped, without claiming any delivery path.

mmabrouk

Reviewed (Claude, on Mahmoud's behalf). Verdict: correct and well-scoped. The gate mirrors the existing not-implemented gates, fires before any cwd/sandbox is created, the false "delivered via the file relay" log is fixed with a defense-in-depth condition, and the test coverage (gate fires, Pi exempt, local exempt, client-only exempt, no-tools exempt) is exactly right. CI is green. This is strictly better than the silent zero-tools ok:true behavior and I'd ship it.

Four decision points are flagged inline for Mahmoud to rule on before merge. Short map of the overlapping work, since three PRs touch the same gap:

#4985 (client tools to Claude, pre-rename, needs the services/agent -> services/runner rebase) contains its own copy of this gate (DAYTONA_TOOL_DELIVERY_UNSUPPORTED_MESSAGE) and changes what the client-tool exemption here means.
#4873 (in-sandbox stdio MCP relay shim, also pre-rename) already implements the "real fix" this PR's remote-tools-delivery docs recommend as candidate (b).
#5045/#5053 (E2B stack) add a sandbox="e2b" provider that this gate's isDaytona predicate does not cover.

None of these block the gate itself; they decide its exact predicate and who owns the follow-ups.

mmabrouk · 2026-07-04T16:38:58Z

+  // every tool, and still returned `ok:true`. Refuse up front, the way the other not-implemented
+  // gates above do. `executableToolSpecsForRun` excludes `client` tools (browser-fulfilled, never
+  // routed through this channel), matching what the internal MCP channel actually advertises.
+  if (!isPi && isDaytona && executableToolSpecsForRun.length > 0) {


Decision 1 - predicate: isDaytona or "any non-local sandbox"?

The E2B stack (#5045, and #5053 for Claude-on-E2B) adds sandbox="e2b", and nothing in those diffs extends this gate. As written, claude x e2b x tools will bypass the gate and silently drop tools again - the exact F1 bug, one provider over.

Recommendation: fail closed for any remote provider, e.g. key this on sandbox !== "local" (or an isRemote derived flag), so a new provider starts refused-with-a-clear-error until someone proves tool delivery there. Alternative: keep isDaytona and make extending the gate an explicit task in the E2B lane. The first option is safer; unknown providers should not fail open.

agree with recommendation. please update the pr accordingly.

Done in 0e24206: gate now keys on isRemoteSandbox = sandboxId !== "local", message reworded to "remote sandbox", and a fail-closed unit test for an unknown provider (sandbox: "e2b") added.

mmabrouk · 2026-07-04T16:38:58Z

+  // (unreachable from inside the sandbox), and the file-relay fallback has a sandbox-side writer
+  // only inside Pi's bundled extension. Before this gate the run proceeded, silently dropped
+  // every tool, and still returned `ok:true`. Refuse up front, the way the other not-implemented
+  // gates above do. `executableToolSpecsForRun` excludes `client` tools (browser-fulfilled, never


Decision 2 - the client-tool exemption interacts with #4985.

Two things worth deciding consciously:

Even today, claude x daytona x client-only is not actually a working combination: client tools are filtered out of the MCP advertisement for Claude everywhere (that is the bug feat(agent): deliver client tools to Claude over the internal MCP channel #4985 fixes), so the model never sees the tool and the run silently delivers nothing. The exemption is transport-correct (client tools never ride this channel) but outcome-wise it lets a silent no-delivery combo pass the gate.

After feat(agent): deliver client tools to Claude over the internal MCP channel #4985 lands, client tools DO ride the internal agenta-tools MCP channel for Claude - which is unreachable on Daytona. At that point this exemption reopens the silent drop for client tools specifically. (feat(agent): deliver client tools to Claude over the internal MCP channel #4985's own copy of the gate does not exempt them, so the two gates disagree on scope.)

Options: (a) drop the exemption now (strictest, matches #4985's version), or (b) keep it and make tightening the gate an explicit item in #4985's rebase. Either is fine; it just should not be resolved by accident during rebase conflict resolution.

let's keep the logic for the client tools fix in 4985

Kept the exemption here per Mahmoud's ruling; the gate comment now names #4985 as the owner of tightening it when client tools move onto the MCP channel.

mmabrouk · 2026-07-04T16:38:58Z

+ * zero tools with `ok:true` (the silent-tool-drop bug). Refuse loud instead, mirroring the
+ * other not-implemented gates above.
+ */
+export const REMOTE_TOOLS_UNSUPPORTED_MESSAGE =


Decision 3 - duplicate gate in #4985: pick one owner.

#4985 (pre-rename, awaiting the services/agent -> services/runner rebase) adds the same up-front refusal as DAYTONA_TOOL_DELIVERY_UNSUPPORTED_MESSAGE. If both merge as-is we get two constants, two messages, and two slightly different scopes (see the client-tool comment) for one rule.

Recommendation: treat THIS gate as canonical (it is green, post-rename, and better documented via the remote-tools-delivery project), and strip the gate portion out of #4985 during its rebase, keeping #4985 focused on client-tool delivery + the seam consolidation.

agree. please remember to strip this during the rebase

Agreed; noted as an explicit step in the #4985 rebase plan (strip DAYTONA_TOOL_DELIVERY_UNSUPPORTED_MESSAGE, this gate is canonical).

mmabrouk · 2026-07-04T16:38:58Z

+| **Which cells does it unblock** | All remote-sandbox non-Pi combinations at once, retroactively, for any current or future provider that can offer the reachable-URL primitive — but gated on that primitive existing per-provider | All remote-sandbox non-Pi combinations at once IF the daemon-spawned sub-option lands (provider-agnostic); narrower if stuck relying on per-harness extension support |
+| **Security review surface** | New: a bearer-secured internet-reachable (or provider-proxied) endpoint serving a JSON-RPC tool-call API — needs its own threat model, closer in shape to the SSRF-guarded user-HTTP-MCP path (`mcp.ts` `validateUserMcpUrl`) than to today's "carries no secret because unreachable" internal channel | Reuses the EXACT trust boundary already accepted for Pi + Daytona (file relay inside a daemon-managed sandbox filesystem) — no new category of risk, just a second writer of an existing protocol |
+
+## Recommendation


FYI for task 0 in tasks.md: candidate (b) already has a working implementation in #4873 ("in-sandbox stdio MCP relay shim", F-042) - relay-mcp-stdio.ts + relay-shim.ts + relay-client.ts, i.e. an in-sandbox stdio MCP server that writes the file relay, spawned without daemon changes. It predates the services/agent -> services/runner rename so it needs a path migration, but the load-bearing question ("can an in-sandbox stdio MCP server be attached alongside a non-Pi harness?") appears already answered there. Recommend reviving/rebasing #4873 as the implementation of this spec rather than re-spiking, and cross-linking it from this doc.

mmabrouk · 2026-07-04T16:38:58Z

  });

-  it("release() sends a final heartbeat with is_running=false and status=ended", async () => {
+  it("release() sends a final heartbeat with is_running=false (status derived server-side)", async () => {


Minor: this heartbeat-contract test fix is unrelated to the remote-tools gate (separate commit, fixing a stale assertion after status was dropped from the heartbeat). Fine to ride along; flagging only so the PR description's scope is understood.

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

services/runner/src/engines/sandbox_agent/mcp.ts (1)
161-170: 📐 Maintainability & Code Quality | 🟠 Major | ⚡ Quick win

Stale isDaytona field doc contradicts the fix a few lines below.

This docstring still unconditionally asserts "gateway tools are delivered through the file relay instead (the relay loop already polls the sandbox filesystem on Daytona...)" for the Daytona case. That's precisely the false claim research.md cites as part of the F1 root cause: "The code's own comment claimed the gap was covered ... This is false for a non-Pi harness." The corrected reasoning was added to the function-level JSDoc at Line 193-201 in this same file, but this field-level doc wasn't updated to match, leaving two contradictory explanations of the same field in one file.
📝 Proposed doc fix
   /**
    * True when the run executes in a REMOTE Daytona sandbox (the harness runs IN the sandbox,
    * not on the runner host). Gates the internal gateway-tool channel: the channel's loopback
    * (`127.0.0.1`) HTTP MCP URL resolves to the SANDBOX's loopback there, not the runner's, so
-   * advertising it would hand the in-sandbox harness an unreachable URL. On Daytona the channel
-   * is skipped and gateway tools are delivered through the file relay instead (the relay loop
-   * already polls the sandbox filesystem on Daytona — see `engines/sandbox_agent.ts`). See the
-   * Daytona guard in `buildSessionMcpServers`.
+   * advertising it would hand the in-sandbox harness an unreachable URL. On Daytona the channel
+   * is skipped; only Pi has a working fallback (its bundled extension writes the file relay).
+   * For any other harness this means NO delivery path exists today (F1) — `run-plan.ts` refuses
+   * that combination up front. See the Daytona guard in `buildSessionMcpServers`.
    */
   isDaytona: boolean;

🧹 Nitpick comments (2)

services/runner/tests/unit/sandbox-agent-run-plan.test.ts (1)

420-446: 🎯 Functional Correctness | 🔵 Trivial

Optional: pin gate-precedence for claude + restricted-network + daytona + tools.

No test currently asserts that the F1 gate's REMOTE_TOOLS_UNSUPPORTED_MESSAGE (not the Layer-2 network-boundary message) is what a claude + Daytona + restricted-network + tools run returns. Both gates would fire on that input; a future reordering could silently swap which error a non-Pi caller sees without any test catching it.

services/runner/src/engines/sandbox_agent/run-plan.ts (1)

214-214: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Centralize the executable-tool filter executableToolSpecs() and the MCP delivery path should share one helper for (kind ?? "callback") !== "client" so a new tool kind can’t drift between the F1 gate and what the channel delivers.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: b3f04ffc-e4c2-4d43-8f20-943073a19f13

📥 Commits

Reviewing files that changed from the base of the PR and between 1cfb3dc and 3db047a.

📒 Files selected for processing (8)

docs/design/agent-workflows/projects/remote-tools-delivery/research.md
docs/design/agent-workflows/projects/remote-tools-delivery/specs.md
docs/design/agent-workflows/projects/remote-tools-delivery/tasks.md
services/runner/src/engines/sandbox_agent/mcp.ts
services/runner/src/engines/sandbox_agent/run-plan.ts
services/runner/tests/unit/sandbox-agent-run-plan.test.ts
services/runner/tests/unit/session-alive.test.ts
services/runner/tests/unit/session-mcp-layering.test.ts

…honest skip log Address review on #5047: the F1 gate now keys on sandbox !== "local" (isRemoteSandbox) so an unknown or future remote provider (E2B et al.) fails closed with the same loud error instead of silently re-opening the zero-tools drop; reword the message to say "remote sandbox". Replace the dead isPi-guarded file-relay log in mcp.ts (unreachable past the isPi early-return, per Copilot) with an honest "skipped the loopback advertisement" log that never claims a delivery that is not happening. Add a fail-closed unit test for an unknown provider (sandbox=e2b). Client-tool exemption stays; #4985 owns tightening it when client tools move onto the MCP channel. Claude-Session: https://claude.ai/code/session_01HhBEUFbXETNjYdcz71AGrT

…annel Claude never saw client tools (request_connection): the internal agenta-tools channel filtered them out of tools/list, so the model could not call them and the browser widget never rendered — a silent drop, Pi-only feature. Now: - mcp-bridge/tool-mcp-http advertise client tools when a clientToolRelay is wired (local Claude) and PAUSE a client tools/call through the shared seam: the handler returns the MCP_PARKED sentinel, the listener destroys the socket with no JSON-RPC body (a result would let Claude settle the call and clobber the pending widget), and the turn ends paused. Required args are validated first so an under-specified call is a normal MCP tool error (model retries). - engine: deferred clientToolRelay ref (the MCP server is built before the responder exists), ACP tool-call correlation index recorded from the event stream (the paused widget attaches to Claude's real tool bubble and its late frames are suppressed), and an mcpAbort AbortController fired from the pause controller's destroy path + the finally so no in-flight tools/call settles after the turn ends. - run-plan (#5047 gate tightened, as its comment assigned to #4985): a non-Pi remote-sandbox run now refuses on ANY custom tool (toolSpecs), client kind included — client tools ride the MCP channel now, so on a remote sandbox they are exactly as undeliverable as gateway tools. Flipped the claude x daytona x client-only test to assert refusal; research.md §4 updated to match. Claude-Session: https://claude.ai/code/session_01HhBEUFbXETNjYdcz71AGrT

jp-agenta and others added 2 commits July 2, 2026 15:10

test(runner): drop stale status=ended assertion from release heartbea…

2f7e566

…t test The heartbeat contract dropped the status field (the API derives ended server-side when the heartbeat stops); the test still asserted it. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Copilot AI review requested due to automatic review settings July 2, 2026 21:41

dosubot Bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Jul 2, 2026

dosubot Bot added the tests label Jul 2, 2026

Copilot started reviewing on behalf of junaway July 2, 2026 21:42 View session

Copilot AI reviewed Jul 2, 2026

View reviewed changes

jp-agenta marked this pull request as draft July 2, 2026 21:45

mmabrouk reviewed Jul 4, 2026

View reviewed changes

coderabbitai Bot reviewed Jul 4, 2026

View reviewed changes

mmabrouk marked this pull request as ready for review July 4, 2026 17:07

dosubot Bot added the Backend label Jul 4, 2026

vercel Bot deployed to Preview July 4, 2026 17:08 View deployment

mmabrouk merged commit 80a482b into big-agents Jul 4, 2026
34 checks passed

This was referenced Jul 4, 2026

[integration] big-agents #4791

Open

docs(agent-workflows): MCP delivery architecture — current state and directions #5067

Open

mmabrouk mentioned this pull request Jul 4, 2026

feat(agent): deliver client tools to Claude over the internal MCP channel #4985

Open

Uh oh!

Conversation

junaway commented Jul 2, 2026

Context

Changes

Tests / notes

Uh oh!

vercel Bot commented Jul 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot commented Jul 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mmabrouk left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

vercel Bot commented Jul 2, 2026 •

edited

Loading

coderabbitai Bot commented Jul 2, 2026 •

edited

Loading