feat(sdk): handler-owned invoke negotiation (stream/trim/force) with batch=fold(stream) by junaway · Pull Request #5064 · Agenta-AI/agenta

junaway · 2026-07-03T23:45:36Z

Context

/invoke negotiated stream, history, and control in four different places, and each place disagreed about what the flags meant. The agent's batch response was a hand-built single message ({"messages": [{"role": "assistant", "content": result.output}]}), while its stream response carried the full turn (tool calls, tool results, the real assistant text); accumulating the stream did not reconstruct the batch response. The normalizer middleware separately drained generic generators on stream=false and trimmed the result as raw events, so "last" meant the final done event on that path and the final message on the agent's own path. llm_v0 documented a full-message-history contract but the shared normalizer silently trimmed it to one message anyway. Two invoke surfaces existed (route()-mounted apps and the generic root POST /invoke dispatch) and only one of them did any of this negotiation, so the same request could get a batch response through one surface and an ndjson stream through the other.

Root cause: negotiation was split between routing, the normalizer, and the handler, with no single owner. Full design writeup: docs/designs/invoke-negotiations/specs.md.

Changes

The rule: boolean command flags (stream, trim, force) are resolved inside the handler and never change outside it. Routing keeps exactly two jobs: header-to-flag sugar (an explicit body flag always wins) and the one HTTP-only value negotiation, format. The running middleware keeps exactly one flag, resolve, which it consumes and strips before the handler ever sees the request. Anything a handler can't deliver is a 406, with no courtesy aggregation: batch asked of a stream-only handler 406s, stream asked of a batch-only handler 406s, force 406s until take-over semantics exist.

Renames (no dual-support window; the only caller of the old names was the playground, verified it never sent them): flag history to trim, flag control to force, header x-ag-messages-history to x-ag-messages-transcript. Two new headers name their opt-in explicitly: x-ag-session-control: force and x-ag-workflow-embeds: resolve (absent means null, which for every flag except resolve means false; resolve's null default stays true to preserve today's hydrate-by-default behavior, with an explicit flags: {resolve: false} body-only off-switch).

batch = fold(stream), by construction. A new pure function, fold(events) -> {messages, stop_reason, pending_interaction} (sdks/python/agenta/sdk/agents/fold.py), turns the canonical agenta event vocabulary into the real turn: assistant text from message events, tool turns from tool_call/tool_result pairs, in order. A sibling trim_to_trailing_unit(messages) returns just the trailing unit (the last assistant message, or the full trailing tool/approval run) when a caller asks for trim=true. Both invoke shapes now consume the same live event stream, so folding it client-side reproduces the batch envelope exactly, pinned by a hard route-level contract test (stream a request, batch the same request, assert fold(streamed events) deep-equals the batch output).

Before (agent batch response, any tool call is invisible):

{"messages": [{"role": "assistant", "content": "Paris"}]}

After (the real turn, trimmable per-call):

{
  "messages": [
    {"role": "assistant", "content": ""},
    {"role": "tool", "tool_name": "search", "input": {...}},
    {"role": "tool", "content": "...", "tool_call_id": "..."},
    {"role": "assistant", "content": "Paris"}
  ],
  "stop_reason": "done"
}

agent_v0 moves into the SDK. It was the one builtin without a named SDK handler: agenta:builtin:agent:v0 had a catalog entry and an interface but no entry in HANDLER_REGISTRY, so any SDK process other than the agent service resolving the URI got nothing. sdks/python/agenta/sdk/agents/handler.py now owns the whole flag contract (stream/batch pre-branch, fold, trim, force to 406) behind an injectable AgentComposition seam (tool/MCP resolvers, secret provider, default template, backend selector) that defaults to env-driven behavior. services/oss/src/agent/app.py shrinks to service-specific composition and mount; the synthetic single-message envelope and the old _agent_batch/_agent_event_stream bodies are gone from the service.

llm_v0 honors its own documented contract. It now reads trim off its own request (default full, matching its docstring) instead of getting silently trimmed by the shared normalizer, and shares the same force to 406 mapping as the agent.

Both invoke surfaces negotiate identically. The header-to-flag fills, session-id extraction, and vercel input projection that used to live only in route()'s endpoint prelude are extracted into one shared helper, called from both the route mount and services/entrypoints/main.py's root POST /invoke dispatch. A parity test drives the same request and headers against both surfaces and asserts an identical response.

Removed: the normalizer's stream=false drain branch and its {messages: [...]} envelope trim. Generators always pass through as stream responses; direct returns pass through unmodified. The handler owns its output shape end to end; a 406 at routing covers a batch-asked stream-only handler (a conscious reversion of the old drain-and-aggregate courtesy for custom user workflows with no request param).

Tests / notes

Four test levels, matching where the contract crosses a layer boundary (specs.md "Testing contract"):

Handlers direct: 27-combo stream x trim x force cube for _agent and llm_v0, called as plain functions.
@workflow programmatic: passthrough + resolve-strip assertions, both for a request-taking handler and a flag-blind one.
@instrument invariance: the span tree and accumulated trace output are identical across every flag/header combination the handler can satisfy; negotiations change the response, never the trace.
@route: header-semantics sweep per axis with body-wins precedence, a negotiation cube against the real /agent/v0/invoke and /llm/v0/invoke mounts, the full 406 matrix, and the dispatch-parity test.

Full harness green against the local EE dev stack (freshly redeployed with the branch):

py-run-tests --sdk -aiu: 1546 unit / 136 integration / 124 acceptance passed.
py-run-tests --api -aiu: 1430 unit / 775 integration+acceptance passed.
py-run-tests --services -aiu: 73 unit / 15 integration / 158 acceptance passed.
ts-run-tests --runner -aiu: 394 / 8 / 16 passed.
ts-run-tests --web -iu: passed.

Three acceptance tests initially failed because they pinned the exact pre-existing behavior this PR intentionally changes (agent_v0 unregistered, courtesy aggregation on a flag-blind generator, a stream-vs-batch span-output cube that conflated two structurally different output shapes into one global invariant). Fixed as test-side assertions only, no source behavior changed.

Web sends only x-ag-messages-format: vercel and never sent any of the renamed/removed headers; confirmed no web code assumes the old single-message batch envelope (AgentChatTransport.ts already picks the trailing assistant message out of a multi-message array, not messages[0]). No web changes needed.

🤖 Generated with Claude Code

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

…t, agent_v0 SDK handler, dual-surface parity Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

vercel · 2026-07-03T23:45:42Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
agenta-documentation	Ready	Preview, Comment	Jul 3, 2026 11:46pm

coderabbitai · 2026-07-03T23:45:44Z

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: a5cbcfbe-aa30-4b21-b621-0f67c714e240

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

✅ Review completed - (🔄 Check again to review again)

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/invoke-negotiations

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands.}

Copilot

Pull request overview

This PR consolidates /invoke negotiation semantics by moving command-flag resolution (stream, trim, force) into handlers, making batch output a deterministic fold of the canonical event stream, and ensuring both invoke entrypoints (route mounts and root dispatch) apply identical pre-processing and negotiation.

Changes:

Centralizes HTTP header → flag “sugar” (body-wins) + session-id extraction + Vercel input projection into apply_invoke_prelude, used by both route mounts and the root POST /invoke dispatch.
Introduces fold(events) + trim_to_trailing_unit(messages) so batch = fold(stream) by construction; updates agent and llm handlers to honor trim and reject force with 406.
Removes normalizer-side generator draining and messages-envelope trimming; stream/batch/trim shape is now handler-owned, with unsupported combinations resulting in 406.

Reviewed changes

Copilot reviewed 32 out of 32 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
services/oss/tests/pytest/unit/agent/test_invoke_handler.py	Expands agent handler unit coverage to the stream/trim/force cube and updates expectations to folded-turn semantics.
services/oss/tests/pytest/unit/agent/conftest.py	Updates fake session streaming fixture to emit raw event turns (or synthetic message fallback).
services/oss/src/agent/schemas.py	Updates schema commentary to reflect `trim` naming.
services/oss/src/agent/app.py	Shrinks service handler to composition + delegation to SDK agent handler helpers; enforces `force`→406.
services/entrypoints/main.py	Applies shared invoke prelude in root dispatch surface for negotiation parity.
sdks/python/oss/tests/pytest/unit/test_workflow_request_flags_running.py	Renames request flag fields to `trim`/`force` and updates expectations.
sdks/python/oss/tests/pytest/unit/test_workflow_negotiation_cube_routing.py	Updates routing cube to enforce 406 symmetry (no courtesy aggregation) and new header names/axes.
sdks/python/oss/tests/pytest/unit/test_workflow_history_running.py	Removes obsolete history/aggregation tests tied to prior normalizer behavior.
sdks/python/oss/tests/pytest/unit/test_workflow_control_running.py	Updates documentation in tests to reference `force` terminology.
sdks/python/oss/tests/pytest/unit/test_workflow_aggregation_running.py	Reworks programmatic invoke tests to assert normalizer passthrough + `resolve` stripping.
sdks/python/oss/tests/pytest/unit/test_returned_generator_route.py	Updates returned-generator routing tests to 406 behavior for batch Accept against stream-only handlers.
sdks/python/oss/tests/pytest/unit/test_llm_v0_handler_flags_running.py	Adds direct handler cube tests for `llm_v0` + verifies chat/completion flag-blind behavior.
sdks/python/oss/tests/pytest/unit/test_invoke_route_aggregation_routing.py	Flips previous aggregation assertions to 406 symmetry and updates header axes.
sdks/python/oss/tests/pytest/unit/test_invoke_real_handlers_negotiation_routing.py	Adds route-level negotiation cube over real agent/llm mounts and 406 force matrix.
sdks/python/oss/tests/pytest/unit/test_invoke_header_semantics_routing.py	Adds per-axis header semantics sweep validating body-wins and lenient unknown values.
sdks/python/oss/tests/pytest/unit/test_invoke_dispatch_parity_routing.py	Adds parity tests ensuring route mounts and root dispatch behave identically for the same requests/headers.
sdks/python/oss/tests/pytest/unit/test_batch_fold_stream_contract_routing.py	Pins the hard contract: fold(streamed events) deep-equals batch outputs over real agent handler.
sdks/python/oss/tests/pytest/unit/agents/test_fold.py	Adds pure unit tests for fold + trailing-unit trimming across event scenarios.
sdks/python/oss/tests/pytest/integration/observability/test_workflow_instrument_programmatic.py	Extends trace invariance coverage across expanded negotiation axes.
sdks/python/oss/tests/pytest/acceptance/workflows/test_new_uri_handlers.py	Updates acceptance expectations: agent handler is now SDK-registered and resolvable by URI.
sdks/python/oss/tests/pytest/acceptance/observability/test_workflow_instrument_routed.py	Extends routed observability invariance tests to new transcript/control/embeds axes and 406 behavior.
sdks/python/agenta/sdk/models/workflows.py	Renames `history`→`trim`, `control`→`force`, and updates per-call flags docstring.
sdks/python/agenta/sdk/middlewares/running/resolver.py	Consumes `resolve` and strips it from `request.flags` before the handler sees the request.
sdks/python/agenta/sdk/middlewares/running/normalizer.py	Removes generator drain + envelope trimming; generators always stream and dict returns pass through.
sdks/python/agenta/sdk/engines/running/utils.py	Registers `agent_v0` in the builtin handler registry.
sdks/python/agenta/sdk/engines/running/handlers.py	Updates `llm_v0` to honor `trim` and reject `force` (406).
sdks/python/agenta/sdk/engines/running/errors.py	Introduces `ForceNotSupportedV0Error` (406-mapped) for force semantics not yet supported.
sdks/python/agenta/sdk/decorators/routing.py	Adds new headers, shared `apply_invoke_prelude`, and updates route invoke endpoint to use it.
sdks/python/agenta/sdk/agents/handler.py	Adds SDK-owned `agent_v0` handler with injectable composition and shared stream/batch helpers.
sdks/python/agenta/sdk/agents/fold.py	Adds pure fold + trim primitives to define batch output as a fold of the event stream.
docs/designs/invoke-negotiations/tasks.md	Captures the implementation task plan and verification harness for the negotiation redesign.
docs/designs/invoke-negotiations/specs.md	Defines the new negotiation contract, semantics, and 4-level testing strategy.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+    _accept = _parse_accept(req)
+    _flags = dict(request.flags or {})
+    if "stream" not in _flags:
+        _flags["stream"] = _accept in STREAM_MEDIA_TYPES
+    if "trim" not in _flags:
+        _trim = _parse_transcript_header(req)
+        if _trim is not None:
+            _flags["trim"] = _trim
+    if "force" not in _flags:
+        _force = _parse_session_control_header(req)
+        if _force is not None:
+            _flags["force"] = _force
+    if "resolve" not in _flags:
+        _resolve = _parse_workflow_embeds_header(req)
+        if _resolve is not None:
+            _flags["resolve"] = _resolve
+    request.flags = _flags


- handler.py: permission_default (the SDK deleted permission_policy in phase 3) - fold(): the terminal result's stop_reason wins over the done event (the live runner emits done without stopReason, so a real pause would otherwise drop its envelope metadata) and pending_interaction carries a derived top-level tool name - sandbox_agent.ts: fold upstream keepers lost to ours-wins conflict resolution (shared apiBase module, claude-model + resolved-model log lines) - service pause test pinned to the realistic stream shape (done event with no stopReason)

…ract into the test_run design

jp-agenta and others added 2 commits July 4, 2026 01:33

docs(sdk): add invoke-negotiations design (specs + tasks)

5e340d8

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

feat(sdk): handler-owned invoke negotiation — fold/trim/force contrac…

a8f9a51

…t, agent_v0 SDK handler, dual-surface parity Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Copilot AI review requested due to automatic review settings July 3, 2026 23:45

dosubot Bot added the size:XXL This PR changes 1000+ lines, ignoring generated files. label Jul 3, 2026

dosubot Bot added Backend enhancement New feature or request refactoring A code change that neither fixes a bug nor adds a feature SDK labels Jul 3, 2026

junaway mentioned this pull request Jul 3, 2026

[integration] big-agents #4791

Open

12 tasks

Copilot started reviewing on behalf of junaway July 3, 2026 23:46 View session

vercel Bot deployed to Preview July 3, 2026 23:46 View deployment

jp-agenta merged commit f337f2f into big-agents Jul 3, 2026
24 checks passed

Copilot AI reviewed Jul 3, 2026

View reviewed changes

mmabrouk added a commit that referenced this pull request Jul 4, 2026

docs(build-kit-tools-cleanup): fold the #5064 invoke negotiation cont…

b961f90

…ract into the test_run design

mmabrouk mentioned this pull request Jul 4, 2026

docs(agent-workflows): build-kit tools cleanup design (tool homes, test_run, skills port) #5060

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(sdk): handler-owned invoke negotiation (stream/trim/force) with batch=fold(stream)#5064

feat(sdk): handler-owned invoke negotiation (stream/trim/force) with batch=fold(stream)#5064
jp-agenta merged 2 commits into
big-agentsfrom
feat/invoke-negotiations

junaway commented Jul 3, 2026

Uh oh!

vercel Bot commented Jul 3, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented Jul 3, 2026 •

edited

Loading

Review skipped

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

junaway commented Jul 3, 2026

Context

Changes

Tests / notes

Uh oh!

vercel Bot commented Jul 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot commented Jul 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

vercel Bot commented Jul 3, 2026 •

edited

Loading

coderabbitai Bot commented Jul 3, 2026 •

edited

Loading