Skip to content

[fix] Refuse tools on non-Pi harness x remote sandbox instead of silently dropping them#5047

Merged
mmabrouk merged 3 commits into
big-agentsfrom
chore/add-remote-tools-gate
Jul 4, 2026
Merged

[fix] Refuse tools on non-Pi harness x remote sandbox instead of silently dropping them#5047
mmabrouk merged 3 commits into
big-agentsfrom
chore/add-remote-tools-gate

Conversation

@junaway

@junaway junaway commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

Context

Any custom tool, including gateway/callback tools or a client tool like request_connection, silently vanished when run with a non-Pi harness on a remote sandbox: the model would receive zero tools with no error surfaced anywhere. The internal MCP server the runner exposes for tool calls listens on the runner's own loopback (127.0.0.1) in tool-mcp-http.ts, which a remote sandbox (Daytona or E2B) can't reach, so mcp.ts silently skipped attaching it. Pi didn't hit this because it carries its own in-sandbox extension that relays tool calls back; Claude, Codex, and OpenCode have no such relay yet.

Changes

run-plan.ts now fails the run plan up front — before the sandbox is even created — when the harness is not Pi, the sandbox is remote, and the run actually requested executable tools: if (!isPi && isDaytona && executableToolSpecsForRun.length > 0) return { ok: false, error: REMOTE_TOOLS_UNSUPPORTED_MESSAGE }. Client-kind tools (the request_connection style, handled entirely client-side) are exempted since they never touch the runner's MCP server. This is the interim fix; the real fix is a relay-client shim so non-Pi harnesses can reach the runner's tool server from a remote sandbox, tracked in docs/design/agent-workflows/projects/remote-tools-delivery/.

Tests / notes

  • New coverage in sandbox-agent-run-plan.test.ts and session-mcp-layering.test.ts covering the gate firing, the client-tool exemption, and Pi being unaffected.
  • Ships as a loud, early failure rather than the previous silent zero-tools behavior; no user-visible change for Pi or for any harness on the local sandbox.

jp-agenta and others added 2 commits July 2, 2026 15:10
…t test

The heartbeat contract dropped the status field (the API derives ended
server-side when the heartbeat stops); the test still asserted it.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…ent drop)

A non-Pi harness (Claude) on a remote (Daytona) sandbox had no working
delivery path for gateway/custom tools: the internal tool-MCP channel is
runner-loopback-only (unreachable from inside the sandbox), and the
file-relay fallback's sandbox-side writer exists only in Pi's bundled
extension. The run proceeded anyway, mcp.ts logged a false "delivered via
the file relay", and the harness silently got zero tools with ok:true.

buildRunPlan now refuses this combination up front (before any sandbox is
created), mirroring the existing not-implemented gates (code tools, stdio
MCP, Pi user-MCP). The mcp.ts log is now Pi-only, as defense-in-depth
against a future gate bypass. Design docs for the real fix (an in-sandbox
relay client vs a sandbox-reachable tool-MCP URL) are tracked under
docs/design/agent-workflows/projects/remote-tools-delivery/.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings July 2, 2026 21:41
@dosubot dosubot Bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Jul 2, 2026
@vercel

vercel Bot commented Jul 2, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
agenta-documentation Ready Ready Preview, Comment Jul 4, 2026 5:08pm

Request Review

@dosubot dosubot Bot added the tests label Jul 2, 2026
@coderabbitai

coderabbitai Bot commented Jul 2, 2026

Copy link
Copy Markdown

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 641a79a2-5eb7-43fb-9b5f-972f6e220f33

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch chore/add-remote-tools-gate

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR prevents a previously silent failure mode where non-Pi harnesses running in a remote Daytona sandbox would receive zero executable tools (due to the runner’s internal tool MCP being loopback-only), by failing the run plan early with an explicit error and by hardening related logging/tests.

Changes:

  • Add an up-front run-plan gate that rejects non-Pi × daytona × executable tools with a clear, actionable error message (client-only tools are exempt).
  • Update MCP-layering behavior/tests to ensure Daytona + non-Pi never emits the (previously false) “delivered via the file relay” claim.
  • Adjust unit tests to align with the updated heartbeat payload contract and add design docs for the longer-term “remote tools delivery” fix.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
services/runner/src/engines/sandbox_agent/run-plan.ts Adds a new early-fail gate (REMOTE_TOOLS_UNSUPPORTED_MESSAGE) for non-Pi + Daytona runs requesting executable tools.
services/runner/src/engines/sandbox_agent/mcp.ts Updates Daytona/internal-tool-channel commentary and restricts the “file relay delivered” log claim to avoid false assertions.
services/runner/tests/unit/sandbox-agent-run-plan.test.ts Adds coverage for the new remote-tools gate and adjusts other Daytona/tool tests to avoid being masked by the new gate.
services/runner/tests/unit/session-mcp-layering.test.ts Adds a regression guard ensuring non-Pi + Daytona never logs “delivered via the file relay”.
services/runner/tests/unit/session-alive.test.ts Updates expectations to reflect removal of status from the heartbeat body (status derived server-side).
docs/design/agent-workflows/projects/remote-tools-delivery/research.md Documents the root cause (F1) and the interim gating fix.
docs/design/agent-workflows/projects/remote-tools-delivery/specs.md Proposes solution candidates for enabling remote tool delivery for non-Pi harnesses.
docs/design/agent-workflows/projects/remote-tools-delivery/tasks.md Tracks an implementation task queue for the recommended longer-term fix.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

// delivery. Kept as a defense-in-depth check — a future change to the early-return above must
// not resurrect the F1 false log ("delivered via the file relay" for a harness with no relay
// writer). `run-plan.ts` refuses this combination before it ever reaches here.
if (isDaytona && toolSpecs.length > 0 && isPi) {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 0e24206: the dead && isPi condition is gone; the log now states only that the loopback advertisement was skipped, without claiming any delivery path.

@jp-agenta jp-agenta marked this pull request as draft July 2, 2026 21:45

@mmabrouk mmabrouk left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed (Claude, on Mahmoud's behalf). Verdict: correct and well-scoped. The gate mirrors the existing not-implemented gates, fires before any cwd/sandbox is created, the false "delivered via the file relay" log is fixed with a defense-in-depth condition, and the test coverage (gate fires, Pi exempt, local exempt, client-only exempt, no-tools exempt) is exactly right. CI is green. This is strictly better than the silent zero-tools ok:true behavior and I'd ship it.

Four decision points are flagged inline for Mahmoud to rule on before merge. Short map of the overlapping work, since three PRs touch the same gap:

  • #4985 (client tools to Claude, pre-rename, needs the services/agent -> services/runner rebase) contains its own copy of this gate (DAYTONA_TOOL_DELIVERY_UNSUPPORTED_MESSAGE) and changes what the client-tool exemption here means.
  • #4873 (in-sandbox stdio MCP relay shim, also pre-rename) already implements the "real fix" this PR's remote-tools-delivery docs recommend as candidate (b).
  • #5045/#5053 (E2B stack) add a sandbox="e2b" provider that this gate's isDaytona predicate does not cover.

None of these block the gate itself; they decide its exact predicate and who owns the follow-ups.

// every tool, and still returned `ok:true`. Refuse up front, the way the other not-implemented
// gates above do. `executableToolSpecsForRun` excludes `client` tools (browser-fulfilled, never
// routed through this channel), matching what the internal MCP channel actually advertises.
if (!isPi && isDaytona && executableToolSpecsForRun.length > 0) {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Decision 1 - predicate: isDaytona or "any non-local sandbox"?

The E2B stack (#5045, and #5053 for Claude-on-E2B) adds sandbox="e2b", and nothing in those diffs extends this gate. As written, claude x e2b x tools will bypass the gate and silently drop tools again - the exact F1 bug, one provider over.

Recommendation: fail closed for any remote provider, e.g. key this on sandbox !== "local" (or an isRemote derived flag), so a new provider starts refused-with-a-clear-error until someone proves tool delivery there. Alternative: keep isDaytona and make extending the gate an explicit task in the E2B lane. The first option is safer; unknown providers should not fail open.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree with recommendation. please update the pr accordingly.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 0e24206: gate now keys on isRemoteSandbox = sandboxId !== "local", message reworded to "remote sandbox", and a fail-closed unit test for an unknown provider (sandbox: "e2b") added.

// (unreachable from inside the sandbox), and the file-relay fallback has a sandbox-side writer
// only inside Pi's bundled extension. Before this gate the run proceeded, silently dropped
// every tool, and still returned `ok:true`. Refuse up front, the way the other not-implemented
// gates above do. `executableToolSpecsForRun` excludes `client` tools (browser-fulfilled, never

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Decision 2 - the client-tool exemption interacts with #4985.

Two things worth deciding consciously:

  1. Even today, claude x daytona x client-only is not actually a working combination: client tools are filtered out of the MCP advertisement for Claude everywhere (that is the bug feat(agent): deliver client tools to Claude over the internal MCP channel #4985 fixes), so the model never sees the tool and the run silently delivers nothing. The exemption is transport-correct (client tools never ride this channel) but outcome-wise it lets a silent no-delivery combo pass the gate.
  2. After feat(agent): deliver client tools to Claude over the internal MCP channel #4985 lands, client tools DO ride the internal agenta-tools MCP channel for Claude - which is unreachable on Daytona. At that point this exemption reopens the silent drop for client tools specifically. (feat(agent): deliver client tools to Claude over the internal MCP channel #4985's own copy of the gate does not exempt them, so the two gates disagree on scope.)

Options: (a) drop the exemption now (strictest, matches #4985's version), or (b) keep it and make tightening the gate an explicit item in #4985's rebase. Either is fine; it just should not be resolved by accident during rebase conflict resolution.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's keep the logic for the client tools fix in 4985

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kept the exemption here per Mahmoud's ruling; the gate comment now names #4985 as the owner of tightening it when client tools move onto the MCP channel.

* zero tools with `ok:true` (the silent-tool-drop bug). Refuse loud instead, mirroring the
* other not-implemented gates above.
*/
export const REMOTE_TOOLS_UNSUPPORTED_MESSAGE =

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Decision 3 - duplicate gate in #4985: pick one owner.

#4985 (pre-rename, awaiting the services/agent -> services/runner rebase) adds the same up-front refusal as DAYTONA_TOOL_DELIVERY_UNSUPPORTED_MESSAGE. If both merge as-is we get two constants, two messages, and two slightly different scopes (see the client-tool comment) for one rule.

Recommendation: treat THIS gate as canonical (it is green, post-rename, and better documented via the remote-tools-delivery project), and strip the gate portion out of #4985 during its rebase, keeping #4985 focused on client-tool delivery + the seam consolidation.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree. please remember to strip this during the rebase

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed; noted as an explicit step in the #4985 rebase plan (strip DAYTONA_TOOL_DELIVERY_UNSUPPORTED_MESSAGE, this gate is canonical).

| **Which cells does it unblock** | All remote-sandbox non-Pi combinations at once, retroactively, for any current or future provider that can offer the reachable-URL primitive — but gated on that primitive existing per-provider | All remote-sandbox non-Pi combinations at once IF the daemon-spawned sub-option lands (provider-agnostic); narrower if stuck relying on per-harness extension support |
| **Security review surface** | New: a bearer-secured internet-reachable (or provider-proxied) endpoint serving a JSON-RPC tool-call API — needs its own threat model, closer in shape to the SSRF-guarded user-HTTP-MCP path (`mcp.ts` `validateUserMcpUrl`) than to today's "carries no secret because unreachable" internal channel | Reuses the EXACT trust boundary already accepted for Pi + Daytona (file relay inside a daemon-managed sandbox filesystem) — no new category of risk, just a second writer of an existing protocol |

## Recommendation

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI for task 0 in tasks.md: candidate (b) already has a working implementation in #4873 ("in-sandbox stdio MCP relay shim", F-042) - relay-mcp-stdio.ts + relay-shim.ts + relay-client.ts, i.e. an in-sandbox stdio MCP server that writes the file relay, spawned without daemon changes. It predates the services/agent -> services/runner rename so it needs a path migration, but the load-bearing question ("can an in-sandbox stdio MCP server be attached alongside a non-Pi harness?") appears already answered there. Recommend reviving/rebasing #4873 as the implementation of this spec rather than re-spiking, and cross-linking it from this doc.

});

it("release() sends a final heartbeat with is_running=false and status=ended", async () => {
it("release() sends a final heartbeat with is_running=false (status derived server-side)", async () => {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: this heartbeat-contract test fix is unrelated to the remote-tools gate (separate commit, fixing a stale assertion after status was dropped from the heartbeat). Fine to ride along; flagging only so the PR description's scope is understood.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreee

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
services/runner/src/engines/sandbox_agent/mcp.ts (1)

161-170: 📐 Maintainability & Code Quality | 🟠 Major | ⚡ Quick win

Stale isDaytona field doc contradicts the fix a few lines below.

This docstring still unconditionally asserts "gateway tools are delivered through the file relay instead (the relay loop already polls the sandbox filesystem on Daytona...)" for the Daytona case. That's precisely the false claim research.md cites as part of the F1 root cause: "The code's own comment claimed the gap was covered ... This is false for a non-Pi harness." The corrected reasoning was added to the function-level JSDoc at Line 193-201 in this same file, but this field-level doc wasn't updated to match, leaving two contradictory explanations of the same field in one file.

📝 Proposed doc fix
   /**
    * True when the run executes in a REMOTE Daytona sandbox (the harness runs IN the sandbox,
    * not on the runner host). Gates the internal gateway-tool channel: the channel's loopback
    * (`127.0.0.1`) HTTP MCP URL resolves to the SANDBOX's loopback there, not the runner's, so
-   * advertising it would hand the in-sandbox harness an unreachable URL. On Daytona the channel
-   * is skipped and gateway tools are delivered through the file relay instead (the relay loop
-   * already polls the sandbox filesystem on Daytona — see `engines/sandbox_agent.ts`). See the
-   * Daytona guard in `buildSessionMcpServers`.
+   * advertising it would hand the in-sandbox harness an unreachable URL. On Daytona the channel
+   * is skipped; only Pi has a working fallback (its bundled extension writes the file relay).
+   * For any other harness this means NO delivery path exists today (F1) — `run-plan.ts` refuses
+   * that combination up front. See the Daytona guard in `buildSessionMcpServers`.
    */
   isDaytona: boolean;
🧹 Nitpick comments (2)
services/runner/tests/unit/sandbox-agent-run-plan.test.ts (1)

420-446: 🎯 Functional Correctness | 🔵 Trivial

Optional: pin gate-precedence for claude + restricted-network + daytona + tools.

No test currently asserts that the F1 gate's REMOTE_TOOLS_UNSUPPORTED_MESSAGE (not the Layer-2 network-boundary message) is what a claude + Daytona + restricted-network + tools run returns. Both gates would fire on that input; a future reordering could silently swap which error a non-Pi caller sees without any test catching it.

services/runner/src/engines/sandbox_agent/run-plan.ts (1)

214-214: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Centralize the executable-tool filter executableToolSpecs() and the MCP delivery path should share one helper for (kind ?? "callback") !== "client" so a new tool kind can’t drift between the F1 gate and what the channel delivers.


ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: b3f04ffc-e4c2-4d43-8f20-943073a19f13

📥 Commits

Reviewing files that changed from the base of the PR and between 1cfb3dc and 3db047a.

📒 Files selected for processing (8)
  • docs/design/agent-workflows/projects/remote-tools-delivery/research.md
  • docs/design/agent-workflows/projects/remote-tools-delivery/specs.md
  • docs/design/agent-workflows/projects/remote-tools-delivery/tasks.md
  • services/runner/src/engines/sandbox_agent/mcp.ts
  • services/runner/src/engines/sandbox_agent/run-plan.ts
  • services/runner/tests/unit/sandbox-agent-run-plan.test.ts
  • services/runner/tests/unit/session-alive.test.ts
  • services/runner/tests/unit/session-mcp-layering.test.ts

…honest skip log

Address review on #5047: the F1 gate now keys on sandbox !== "local" (isRemoteSandbox)
so an unknown or future remote provider (E2B et al.) fails closed with the same loud
error instead of silently re-opening the zero-tools drop; reword the message to say
"remote sandbox". Replace the dead isPi-guarded file-relay log in mcp.ts (unreachable
past the isPi early-return, per Copilot) with an honest "skipped the loopback
advertisement" log that never claims a delivery that is not happening. Add a
fail-closed unit test for an unknown provider (sandbox=e2b). Client-tool exemption
stays; #4985 owns tightening it when client tools move onto the MCP channel.

Claude-Session: https://claude.ai/code/session_01HhBEUFbXETNjYdcz71AGrT
@mmabrouk mmabrouk marked this pull request as ready for review July 4, 2026 17:07
@dosubot dosubot Bot added the Backend label Jul 4, 2026
@mmabrouk mmabrouk merged commit 80a482b into big-agents Jul 4, 2026
34 checks passed
mmabrouk added a commit that referenced this pull request Jul 4, 2026
…annel

Claude never saw client tools (request_connection): the internal agenta-tools
channel filtered them out of tools/list, so the model could not call them and
the browser widget never rendered — a silent drop, Pi-only feature. Now:

- mcp-bridge/tool-mcp-http advertise client tools when a clientToolRelay is
  wired (local Claude) and PAUSE a client tools/call through the shared seam:
  the handler returns the MCP_PARKED sentinel, the listener destroys the socket
  with no JSON-RPC body (a result would let Claude settle the call and clobber
  the pending widget), and the turn ends paused. Required args are validated
  first so an under-specified call is a normal MCP tool error (model retries).
- engine: deferred clientToolRelay ref (the MCP server is built before the
  responder exists), ACP tool-call correlation index recorded from the event
  stream (the paused widget attaches to Claude's real tool bubble and its late
  frames are suppressed), and an mcpAbort AbortController fired from the pause
  controller's destroy path + the finally so no in-flight tools/call settles
  after the turn ends.
- run-plan (#5047 gate tightened, as its comment assigned to #4985): a non-Pi
  remote-sandbox run now refuses on ANY custom tool (toolSpecs), client kind
  included — client tools ride the MCP channel now, so on a remote sandbox
  they are exactly as undeliverable as gateway tools. Flipped the
  claude x daytona x client-only test to assert refusal; research.md §4
  updated to match.

Claude-Session: https://claude.ai/code/session_01HhBEUFbXETNjYdcz71AGrT
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Backend size:L This PR changes 100-499 lines, ignoring generated files. tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants