[integration] big-agents#4791
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
Important Review skippedToo many files! This PR contains 1640 files, which is 1490 over the limit of 150. To get a review, narrow the scope: Upgrade to a paid plan to raise the limit. ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Plus Run ID: ⛔ Files ignored due to path filters (258)
📒 Files selected for processing (1640)
You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
|
There was a problem hiding this comment.
Actionable comments posted: 10
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: 76c33a7d-feff-4e5f-acc0-962498f74cfc
📒 Files selected for processing (70)
sdks/python/agenta/__init__.pysdks/python/agenta/sdk/agents/__init__.pysdks/python/agenta/sdk/agents/adapters/__init__.pysdks/python/agenta/sdk/agents/adapters/_runner_config.pysdks/python/agenta/sdk/agents/adapters/agenta_builtins.pysdks/python/agenta/sdk/agents/adapters/harnesses.pysdks/python/agenta/sdk/agents/adapters/in_process.pysdks/python/agenta/sdk/agents/adapters/local.pysdks/python/agenta/sdk/agents/adapters/sandbox_agent.pysdks/python/agenta/sdk/agents/adapters/vercel/__init__.pysdks/python/agenta/sdk/agents/adapters/vercel/messages.pysdks/python/agenta/sdk/agents/adapters/vercel/routing.pysdks/python/agenta/sdk/agents/adapters/vercel/sse.pysdks/python/agenta/sdk/agents/adapters/vercel/stream.pysdks/python/agenta/sdk/agents/dtos.pysdks/python/agenta/sdk/agents/errors.pysdks/python/agenta/sdk/agents/interfaces.pysdks/python/agenta/sdk/agents/mcp/__init__.pysdks/python/agenta/sdk/agents/mcp/errors.pysdks/python/agenta/sdk/agents/mcp/interfaces.pysdks/python/agenta/sdk/agents/mcp/models.pysdks/python/agenta/sdk/agents/mcp/parsing.pysdks/python/agenta/sdk/agents/mcp/resolver.pysdks/python/agenta/sdk/agents/mcp/wire.pysdks/python/agenta/sdk/agents/streaming.pysdks/python/agenta/sdk/agents/tools/__init__.pysdks/python/agenta/sdk/agents/tools/compat.pysdks/python/agenta/sdk/agents/tools/errors.pysdks/python/agenta/sdk/agents/tools/interfaces.pysdks/python/agenta/sdk/agents/tools/models.pysdks/python/agenta/sdk/agents/tools/parsing.pysdks/python/agenta/sdk/agents/tools/resolver.pysdks/python/agenta/sdk/agents/tools/wire.pysdks/python/agenta/sdk/agents/ui_messages.pysdks/python/agenta/sdk/agents/utils/__init__.pysdks/python/agenta/sdk/agents/utils/ts_runner.pysdks/python/agenta/sdk/agents/utils/wire.pysdks/python/agenta/sdk/decorators/routing.pysdks/python/agenta/sdk/engines/running/interfaces.pysdks/python/agenta/sdk/engines/running/utils.pysdks/python/agenta/sdk/middlewares/running/normalizer.pysdks/python/agenta/sdk/models/workflows.pysdks/python/agenta/sdk/utils/types.pysdks/python/agenta/tests/agents/test_streaming.pysdks/python/oss/tests/pytest/integration/agents/__init__.pysdks/python/oss/tests/pytest/integration/agents/test_transport_roundtrip.pysdks/python/oss/tests/pytest/unit/agents/__init__.pysdks/python/oss/tests/pytest/unit/agents/conftest.pysdks/python/oss/tests/pytest/unit/agents/golden/run_request.claude.jsonsdks/python/oss/tests/pytest/unit/agents/golden/run_request.pi.jsonsdks/python/oss/tests/pytest/unit/agents/golden/run_result.error.jsonsdks/python/oss/tests/pytest/unit/agents/golden/run_result.ok.jsonsdks/python/oss/tests/pytest/unit/agents/mcp/__init__.pysdks/python/oss/tests/pytest/unit/agents/mcp/test_resolver.pysdks/python/oss/tests/pytest/unit/agents/test_dtos_agent_config.pysdks/python/oss/tests/pytest/unit/agents/test_dtos_capabilities_events.pysdks/python/oss/tests/pytest/unit/agents/test_dtos_content_blocks.pysdks/python/oss/tests/pytest/unit/agents/test_dtos_harness_configs.pysdks/python/oss/tests/pytest/unit/agents/test_environment_lifecycle.pysdks/python/oss/tests/pytest/unit/agents/test_harness_adapters.pysdks/python/oss/tests/pytest/unit/agents/test_runner_adapter_config.pysdks/python/oss/tests/pytest/unit/agents/test_ui_messages.pysdks/python/oss/tests/pytest/unit/agents/test_wire_contract.pysdks/python/oss/tests/pytest/unit/agents/tools/__init__.pysdks/python/oss/tests/pytest/unit/agents/tools/test_models.pysdks/python/oss/tests/pytest/unit/agents/tools/test_parsing.pysdks/python/oss/tests/pytest/unit/agents/tools/test_resolver.pysdks/python/oss/tests/pytest/unit/test_normalizer_passthrough.pysdks/python/oss/tests/pytest/utils/test_messages_endpoint.pysdks/python/oss/tests/pytest/utils/test_routing.py
| NOTE on packaging: the Node runner is NOT part of this Python wheel (``pip install agenta`` | ||
| stays pure Python; the wheel contains zero ``.ts``/``.js``). How a standalone Pi user obtains | ||
| the runner -- an ``npx`` npm package, a local checkout, or a Docker sidecar over HTTP -- is an | ||
| open distribution decision; see ``docs/design/agent-workflows/typescript-structure/``. Do NOT | ||
| silently bundle a JS runner into the wheel. |
There was a problem hiding this comment.
Align LocalBackend wording with the stated packaging contract.
Line 9-13 says the wheel must not bundle a JS runner, but Line 30 and the NotImplementedError messages still say “bundled JS”. This contradiction will confuse integrators.
Suggested wording fix
-class LocalBackend(Backend):
- """Run Pi (bundled JS) or Claude (``claude-agent-sdk``) on this machine."""
+class LocalBackend(Backend):
+ """Run Pi (external Node runner) or Claude (``claude-agent-sdk``) on this machine."""
...
raise NotImplementedError(
- "LocalBackend is not implemented yet (Phase 3: Pi via bundled JS, "
+ "LocalBackend is not implemented yet (Phase 3: Pi via external Node runner, "
"Phase 4: Claude via claude-agent-sdk)."
)
...
raise NotImplementedError(
- "LocalBackend is not implemented yet (Phase 3: Pi via bundled JS, "
+ "LocalBackend is not implemented yet (Phase 3: Pi via external Node runner, "
"Phase 4: Claude via claude-agent-sdk)."
)Also applies to: 30-38, 50-53
| def __init__( | ||
| self, | ||
| *, | ||
| sandbox: str = "local", | ||
| url: Optional[str] = None, | ||
| command: Optional[Sequence[str]] = None, | ||
| cwd: Optional[str] = None, | ||
| timeout: float = float(os.getenv("AGENTA_AGENT_RUNNER_TIMEOUT_SECONDS", "180")), | ||
| ) -> None: | ||
| self._sandbox = sandbox | ||
| self._url = url |
There was a problem hiding this comment.
Validate sandbox at construction time.
Line 129 currently accepts any string; invalid values get sent over the wire and fail late. Restrict this to supported values (local, daytona) and raise a configuration error early.
Suggested validation
from ..dtos import (
@@
)
+from ..errors import AgentRunnerConfigurationError
@@
def __init__(
self,
*,
sandbox: str = "local",
@@
timeout: float = float(os.getenv("AGENTA_AGENT_RUNNER_TIMEOUT_SECONDS", "180")),
) -> None:
+ allowed_sandboxes = {"local", "daytona"}
+ if sandbox not in allowed_sandboxes:
+ raise AgentRunnerConfigurationError(
+ f"Unsupported sandbox '{sandbox}'. Expected one of: {sorted(allowed_sandboxes)}."
+ )
self._sandbox = sandbox
self._url = url| from agenta.sdk.agents.tools.models import MissingSecretPolicy | ||
|
|
||
| from .errors import MissingMCPSecretError | ||
| from .interfaces import MCPSecretProvider | ||
| from .models import MCPServerConfig, ResolvedMCPServer | ||
|
|
||
|
|
||
| class MCPResolver: | ||
| def __init__( | ||
| self, | ||
| *, | ||
| secret_provider: MCPSecretProvider, | ||
| missing_secret_policy: MissingSecretPolicy = MissingSecretPolicy.ERROR, | ||
| ) -> None: |
There was a problem hiding this comment.
Breaks declared layer direction by importing tools model into MCP.
MCPResolver currently depends on agenta.sdk.agents.tools.models.MissingSecretPolicy, but this cohort declares tools as depending on MCP, not the other way around. This reverse edge can create import-order fragility and circular dependency risk as the stack evolves. Move MissingSecretPolicy to a neutral/shared module (or MCP/shared contract module) and import it from both subsystems.
Possible direction
- from agenta.sdk.agents.tools.models import MissingSecretPolicy
+ from agenta.sdk.agents.shared.missing_secret_policy import MissingSecretPolicy(then define/move the enum in that shared module and update tools imports accordingly)
| out = stdout.decode("utf-8", "replace") | ||
| err = stderr.decode("utf-8", "replace") | ||
| if not out.strip(): | ||
| raise RuntimeError( | ||
| f"Agent runner returned no output. exit={proc.returncode} stderr={err[-2000:]}" | ||
| ) | ||
| try: | ||
| return json.loads(out) | ||
| except json.JSONDecodeError as exc: |
There was a problem hiding this comment.
Treat non-zero subprocess exit as transport failure even with parseable JSON.
Line 74 returns parsed JSON without checking proc.returncode; a crashed runner can look successful if it emitted partial/legacy JSON before exiting non-zero.
Suggested fix
@@ async def deliver_subprocess(...):
out = stdout.decode("utf-8", "replace")
err = stderr.decode("utf-8", "replace")
+ if proc.returncode not in (0, None):
+ raise RuntimeError(
+ "Agent runner exited non-zero. "
+ f"exit={proc.returncode} stderr={err[-2000:]} stdout={out[:500]}"
+ )
if not out.strip():
raise RuntimeError(
f"Agent runner returned no output. exit={proc.returncode} stderr={err[-2000:]}"
)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| out = stdout.decode("utf-8", "replace") | |
| err = stderr.decode("utf-8", "replace") | |
| if not out.strip(): | |
| raise RuntimeError( | |
| f"Agent runner returned no output. exit={proc.returncode} stderr={err[-2000:]}" | |
| ) | |
| try: | |
| return json.loads(out) | |
| except json.JSONDecodeError as exc: | |
| out = stdout.decode("utf-8", "replace") | |
| err = stderr.decode("utf-8", "replace") | |
| if proc.returncode not in (0, None): | |
| raise RuntimeError( | |
| "Agent runner exited non-zero. " | |
| f"exit={proc.returncode} stderr={err[-2000:]} stdout={out[:500]}" | |
| ) | |
| if not out.strip(): | |
| raise RuntimeError( | |
| f"Agent runner returned no output. exit={proc.returncode} stderr={err[-2000:]}" | |
| ) | |
| try: | |
| return json.loads(out) | |
| except json.JSONDecodeError as exc: |
…ug, reviews, fix plan Design-only workspace for the approval-boundary bug (auto-approved runs park at the first gated tool): plain-words flow explanation across FE/SDK/service/ runner/harness, verified root cause with current services/runner paths, correctness + code-organization reviews of the permission code, Codex-reviewed fix options, and a phased plan (one resolved permission plan, allow|ask|deny everywhere, delete hasHumanSurface, emit approval events only on park, batch surfaces stop_reason). No code changes. Claude-Session: https://claude.ai/code/session_01DGj7GKafjkZeQXMsryWhb2
Generalize the explainer across harnesses (per-harness gate table, Pi has no HITL today), explain Claude default_mode/bypassPermissions and the settings merge semantics, distinguish ACP request vs stream event and the two kinds of session, explain the cold-replay resume model before the responder code, and record the live approve-loop finding. Plan updates: relay becomes a pure executor (decision before execution), client tools resolve through the same ladder defaulting to allow, M2+M3 elevated with the approve-loop as an acceptance case. Claude-Session: https://claude.ai/code/session_01DGj7GKafjkZeQXMsryWhb2
The live approve-loop is diagnosed: a constant stream messageId plus a level-triggered resume predicate on the frontend (new finding M7), compounded by tool-name drift across ACP frames breaking the decision key (M2's observed form, not argument drift). Updated the explainer's live-warning section, reframed M2 and added M7 in the code review, settled the fix direction in the plan (direct replay of the approved call; absorb #5054's message-id fix and edge-trigger guard; supersede its resolvedName patch and loop-breaker), and recorded the #5054 recommendation in status. Claude-Session: https://claude.ai/code/session_01DGj7GKafjkZeQXMsryWhb2
…e target path Global policy becomes four explicit modes (allow|ask|deny|allow_reads): read-only-allow is a policy choice, not a hidden per-tool default, and needs_approval is deleted from the model. 'Disposition' renamed to 'effective permission' everywhere. New 'target path' section shows the clean end-state flow; resume is redesigned to replay the approved call directly. Corrected the session-id story (the playground sends a stable per-conversation id). Added the Pi-builtins explanation (selection is Pi's only native control). Plan gains the stacked-on-#5054 baseline (keep the message-id fix and resume guard; delete resolvedName and the loop-breaker) and updated phases/deltas. Status consolidated for final review. Claude-Session: https://claude.ai/code/session_01DGj7GKafjkZeQXMsryWhb2
… Pi settings block (round 3)
…sume redesign, wire-first phases, test plan)
…y-ask pauses (phase 2b)
…y fallbacks (phase 4b)
…ause controller (phase 5)
…perseded workspaces
…he approval prompt
- handler.py: permission_default (the SDK deleted permission_policy in phase 3) - fold(): the terminal result's stop_reason wins over the done event (the live runner emits done without stopReason, so a real pause would otherwise drop its envelope metadata) and pending_interaction carries a derived top-level tool name - sandbox_agent.ts: fold upstream keepers lost to ours-wins conflict resolution (shared apiBase module, claude-model + resolved-model log lines) - service pause test pinned to the realistic stream shape (done event with no stopReason)
- playground: a rules-only runner.permissions override no longer drops the allow_reads default from the /invoke body (nested merge + test) - runner: a client-tool pause now seeds /sessions/interactions like every other pause (pauseClientTool was the one gate that left no row) - sdk: wire.py imports ToolPermission/PermissionRule from permission_rules.py instead of redeclaring them; greppable legacy-key literals in mcp/models.py; annotate_trace comment and claude_settings docstring tell the truth - docs: MCP-server permission step added to three precedence ladders; permissions wire block documents the optional default + fallbacks; client-tool carve-out paragraph matches onClientTool's actual semantics
…bit round, and merge
…l pauses with their own kind Codex pre-merge review of the rebase: - both ACP pause sites emit a stamped COPY of the toolCall (resolvedName = the gate's stable anchor) so the Vercel egress's preferred field is populated again (upstream stamped by mutating the shared ACP object; we keep it pure) - onCreateInteraction threads kind, so client-tool rows stop masquerading as user_approval, and the Pi relay's client-tool pause now seeds a row at all - the gateway integration test drops the deleted needs_approval vocabulary
[feat] Approval boundary: pause on authored intent, not session ids
|
Merge sync: #5041 (approval boundary redesign: per-tool allow/ask/deny + Notes for anyone building on big-agents:
|
…op reason The live runner emits its 'done' event with no stopReason (the engine settles paused-vs-ended after the event stream closes, onto the terminal result only), so a HITL-paused SSE stream ended with a generic finish frame while the batch envelope already carried stop_reason: paused. - agent_event_stream appends a corrective terminal 'done' when the result's stop_reason disagrees with the streamed one (the streaming analogue of agent_batch's fold(events, stop_reason=result.stop_reason)) - both vercel adapters prefer the last non-null stop reason, and the dev-only AgentStream variant now prefers the terminal result unconditionally (its old fallback-only read meant terminal-wins never applied) Display was never affected (the FE keys off the interaction_request part); this restores wire fidelity for headless/protocol consumers of the stream.
[fix] Paused live streams carry the terminal stop reason in the finish frame
…honest skip log Address review on #5047: the F1 gate now keys on sandbox !== "local" (isRemoteSandbox) so an unknown or future remote provider (E2B et al.) fails closed with the same loud error instead of silently re-opening the zero-tools drop; reword the message to say "remote sandbox". Replace the dead isPi-guarded file-relay log in mcp.ts (unreachable past the isPi early-return, per Copilot) with an honest "skipped the loopback advertisement" log that never claims a delivery that is not happening. Add a fail-closed unit test for an unknown provider (sandbox=e2b). Client-tool exemption stays; #4985 owns tightening it when client tools move onto the MCP channel. Claude-Session: https://claude.ai/code/session_01HhBEUFbXETNjYdcz71AGrT
[fix] Refuse tools on non-Pi harness x remote sandbox instead of silently dropping them
|
Merge sync: #5047 (remote-tools fail-loud gate) merged into |
Context
big-agentsis the integration branch for the agent-workflows feature. Every agent PR targetsbig-agents(directly, or by stacking on one that does). The plan is to review and merge each sub-PR intobig-agents, then mergebig-agentsintomainas a single unit.This PR is a draft tracker. It stays open until all the open sub-PRs below are merged into
big-agents. The branch started from an empty commit, so the diff fills in as sub-PRs land.Integrated PRs
Each box gets checked when that PR is merged into
big-agents. Indented items stack on the item above them.SDK and service
Runner
big-agents(the relay-bug fix, the CI job, and a superset of its tests already landed via feat(agent): runner engines, HTTP server, tracing, and docker image #4778 + chore(agent): make sandbox-agent runner first-class #4786)Frontend
Hosting
Sandbox-agent deployment
Docs
Branch-only (no PR yet)
These design-doc branches are stacked on
big-agentsbut have no PR. Open one if you want them reviewed separately, otherwise they fold in with the docs.docs/agent-model-config-and-provider-authdocs/agent-skills-configdocs/agent-code-tool-sandboxdocs/agent-harness-capabilitiesNotes
big-agents(feat(agent): runner engines, HTTP server, tracing, and docker image #4778 + chore(agent): make sandbox-agent runner first-class #4786 already carry its tests, CI job, and relay-bug fix; itsversion.tswas stale["pi","rivet"]).big-agentsas chore(railway): add sandbox-agent preview deployment #4802 / chore(kubernetes): deploy sandbox-agent sidecar #4803 / ci(agent): build and test sandbox-agent images #4804.