Skip to content

feat(sdk): handler-owned invoke negotiation (stream/trim/force) with batch=fold(stream)#5064

Merged
jp-agenta merged 2 commits into
big-agentsfrom
feat/invoke-negotiations
Jul 3, 2026
Merged

feat(sdk): handler-owned invoke negotiation (stream/trim/force) with batch=fold(stream)#5064
jp-agenta merged 2 commits into
big-agentsfrom
feat/invoke-negotiations

Conversation

@junaway

@junaway junaway commented Jul 3, 2026

Copy link
Copy Markdown
Contributor

Context

/invoke negotiated stream, history, and control in four different places, and each place disagreed about what the flags meant. The agent's batch response was a hand-built single message ({"messages": [{"role": "assistant", "content": result.output}]}), while its stream response carried the full turn (tool calls, tool results, the real assistant text); accumulating the stream did not reconstruct the batch response. The normalizer middleware separately drained generic generators on stream=false and trimmed the result as raw events, so "last" meant the final done event on that path and the final message on the agent's own path. llm_v0 documented a full-message-history contract but the shared normalizer silently trimmed it to one message anyway. Two invoke surfaces existed (route()-mounted apps and the generic root POST /invoke dispatch) and only one of them did any of this negotiation, so the same request could get a batch response through one surface and an ndjson stream through the other.

Root cause: negotiation was split between routing, the normalizer, and the handler, with no single owner. Full design writeup: docs/designs/invoke-negotiations/specs.md.

Changes

The rule: boolean command flags (stream, trim, force) are resolved inside the handler and never change outside it. Routing keeps exactly two jobs: header-to-flag sugar (an explicit body flag always wins) and the one HTTP-only value negotiation, format. The running middleware keeps exactly one flag, resolve, which it consumes and strips before the handler ever sees the request. Anything a handler can't deliver is a 406, with no courtesy aggregation: batch asked of a stream-only handler 406s, stream asked of a batch-only handler 406s, force 406s until take-over semantics exist.

Renames (no dual-support window; the only caller of the old names was the playground, verified it never sent them): flag history to trim, flag control to force, header x-ag-messages-history to x-ag-messages-transcript. Two new headers name their opt-in explicitly: x-ag-session-control: force and x-ag-workflow-embeds: resolve (absent means null, which for every flag except resolve means false; resolve's null default stays true to preserve today's hydrate-by-default behavior, with an explicit flags: {resolve: false} body-only off-switch).

batch = fold(stream), by construction. A new pure function, fold(events) -> {messages, stop_reason, pending_interaction} (sdks/python/agenta/sdk/agents/fold.py), turns the canonical agenta event vocabulary into the real turn: assistant text from message events, tool turns from tool_call/tool_result pairs, in order. A sibling trim_to_trailing_unit(messages) returns just the trailing unit (the last assistant message, or the full trailing tool/approval run) when a caller asks for trim=true. Both invoke shapes now consume the same live event stream, so folding it client-side reproduces the batch envelope exactly, pinned by a hard route-level contract test (stream a request, batch the same request, assert fold(streamed events) deep-equals the batch output).

Before (agent batch response, any tool call is invisible):

{"messages": [{"role": "assistant", "content": "Paris"}]}

After (the real turn, trimmable per-call):

{
  "messages": [
    {"role": "assistant", "content": ""},
    {"role": "tool", "tool_name": "search", "input": {...}},
    {"role": "tool", "content": "...", "tool_call_id": "..."},
    {"role": "assistant", "content": "Paris"}
  ],
  "stop_reason": "done"
}

agent_v0 moves into the SDK. It was the one builtin without a named SDK handler: agenta:builtin:agent:v0 had a catalog entry and an interface but no entry in HANDLER_REGISTRY, so any SDK process other than the agent service resolving the URI got nothing. sdks/python/agenta/sdk/agents/handler.py now owns the whole flag contract (stream/batch pre-branch, fold, trim, force to 406) behind an injectable AgentComposition seam (tool/MCP resolvers, secret provider, default template, backend selector) that defaults to env-driven behavior. services/oss/src/agent/app.py shrinks to service-specific composition and mount; the synthetic single-message envelope and the old _agent_batch/_agent_event_stream bodies are gone from the service.

llm_v0 honors its own documented contract. It now reads trim off its own request (default full, matching its docstring) instead of getting silently trimmed by the shared normalizer, and shares the same force to 406 mapping as the agent.

Both invoke surfaces negotiate identically. The header-to-flag fills, session-id extraction, and vercel input projection that used to live only in route()'s endpoint prelude are extracted into one shared helper, called from both the route mount and services/entrypoints/main.py's root POST /invoke dispatch. A parity test drives the same request and headers against both surfaces and asserts an identical response.

Removed: the normalizer's stream=false drain branch and its {messages: [...]} envelope trim. Generators always pass through as stream responses; direct returns pass through unmodified. The handler owns its output shape end to end; a 406 at routing covers a batch-asked stream-only handler (a conscious reversion of the old drain-and-aggregate courtesy for custom user workflows with no request param).

Tests / notes

Four test levels, matching where the contract crosses a layer boundary (specs.md "Testing contract"):

  • Handlers direct: 27-combo stream x trim x force cube for _agent and llm_v0, called as plain functions.
  • @workflow programmatic: passthrough + resolve-strip assertions, both for a request-taking handler and a flag-blind one.
  • @instrument invariance: the span tree and accumulated trace output are identical across every flag/header combination the handler can satisfy; negotiations change the response, never the trace.
  • @route: header-semantics sweep per axis with body-wins precedence, a negotiation cube against the real /agent/v0/invoke and /llm/v0/invoke mounts, the full 406 matrix, and the dispatch-parity test.

Full harness green against the local EE dev stack (freshly redeployed with the branch):

  • py-run-tests --sdk -aiu: 1546 unit / 136 integration / 124 acceptance passed.
  • py-run-tests --api -aiu: 1430 unit / 775 integration+acceptance passed.
  • py-run-tests --services -aiu: 73 unit / 15 integration / 158 acceptance passed.
  • ts-run-tests --runner -aiu: 394 / 8 / 16 passed.
  • ts-run-tests --web -iu: passed.

Three acceptance tests initially failed because they pinned the exact pre-existing behavior this PR intentionally changes (agent_v0 unregistered, courtesy aggregation on a flag-blind generator, a stream-vs-batch span-output cube that conflated two structurally different output shapes into one global invariant). Fixed as test-side assertions only, no source behavior changed.

Web sends only x-ag-messages-format: vercel and never sent any of the renamed/removed headers; confirmed no web code assumes the old single-message batch envelope (AgentChatTransport.ts already picks the trailing assistant message out of a multi-message array, not messages[0]). No web changes needed.

🤖 Generated with Claude Code

jp-agenta and others added 2 commits July 4, 2026 01:33
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…t, agent_v0 SDK handler, dual-surface parity

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings July 3, 2026 23:45
@vercel

vercel Bot commented Jul 3, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
agenta-documentation Ready Ready Preview, Comment Jul 3, 2026 11:46pm

Request Review

@dosubot dosubot Bot added the size:XXL This PR changes 1000+ lines, ignoring generated files. label Jul 3, 2026
@coderabbitai

coderabbitai Bot commented Jul 3, 2026

Copy link
Copy Markdown

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: a5cbcfbe-aa30-4b21-b621-0f67c714e240

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • ✅ Review completed - (🔄 Check again to review again)
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/invoke-negotiations

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@dosubot dosubot Bot added Backend enhancement New feature or request refactoring A code change that neither fixes a bug nor adds a feature SDK labels Jul 3, 2026
@junaway junaway mentioned this pull request Jul 3, 2026
12 tasks
@jp-agenta jp-agenta merged commit f337f2f into big-agents Jul 3, 2026
24 checks passed

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR consolidates /invoke negotiation semantics by moving command-flag resolution (stream, trim, force) into handlers, making batch output a deterministic fold of the canonical event stream, and ensuring both invoke entrypoints (route mounts and root dispatch) apply identical pre-processing and negotiation.

Changes:

  • Centralizes HTTP header → flag “sugar” (body-wins) + session-id extraction + Vercel input projection into apply_invoke_prelude, used by both route mounts and the root POST /invoke dispatch.
  • Introduces fold(events) + trim_to_trailing_unit(messages) so batch = fold(stream) by construction; updates agent and llm handlers to honor trim and reject force with 406.
  • Removes normalizer-side generator draining and messages-envelope trimming; stream/batch/trim shape is now handler-owned, with unsupported combinations resulting in 406.

Reviewed changes

Copilot reviewed 32 out of 32 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
services/oss/tests/pytest/unit/agent/test_invoke_handler.py Expands agent handler unit coverage to the stream/trim/force cube and updates expectations to folded-turn semantics.
services/oss/tests/pytest/unit/agent/conftest.py Updates fake session streaming fixture to emit raw event turns (or synthetic message fallback).
services/oss/src/agent/schemas.py Updates schema commentary to reflect trim naming.
services/oss/src/agent/app.py Shrinks service handler to composition + delegation to SDK agent handler helpers; enforces force→406.
services/entrypoints/main.py Applies shared invoke prelude in root dispatch surface for negotiation parity.
sdks/python/oss/tests/pytest/unit/test_workflow_request_flags_running.py Renames request flag fields to trim/force and updates expectations.
sdks/python/oss/tests/pytest/unit/test_workflow_negotiation_cube_routing.py Updates routing cube to enforce 406 symmetry (no courtesy aggregation) and new header names/axes.
sdks/python/oss/tests/pytest/unit/test_workflow_history_running.py Removes obsolete history/aggregation tests tied to prior normalizer behavior.
sdks/python/oss/tests/pytest/unit/test_workflow_control_running.py Updates documentation in tests to reference force terminology.
sdks/python/oss/tests/pytest/unit/test_workflow_aggregation_running.py Reworks programmatic invoke tests to assert normalizer passthrough + resolve stripping.
sdks/python/oss/tests/pytest/unit/test_returned_generator_route.py Updates returned-generator routing tests to 406 behavior for batch Accept against stream-only handlers.
sdks/python/oss/tests/pytest/unit/test_llm_v0_handler_flags_running.py Adds direct handler cube tests for llm_v0 + verifies chat/completion flag-blind behavior.
sdks/python/oss/tests/pytest/unit/test_invoke_route_aggregation_routing.py Flips previous aggregation assertions to 406 symmetry and updates header axes.
sdks/python/oss/tests/pytest/unit/test_invoke_real_handlers_negotiation_routing.py Adds route-level negotiation cube over real agent/llm mounts and 406 force matrix.
sdks/python/oss/tests/pytest/unit/test_invoke_header_semantics_routing.py Adds per-axis header semantics sweep validating body-wins and lenient unknown values.
sdks/python/oss/tests/pytest/unit/test_invoke_dispatch_parity_routing.py Adds parity tests ensuring route mounts and root dispatch behave identically for the same requests/headers.
sdks/python/oss/tests/pytest/unit/test_batch_fold_stream_contract_routing.py Pins the hard contract: fold(streamed events) deep-equals batch outputs over real agent handler.
sdks/python/oss/tests/pytest/unit/agents/test_fold.py Adds pure unit tests for fold + trailing-unit trimming across event scenarios.
sdks/python/oss/tests/pytest/integration/observability/test_workflow_instrument_programmatic.py Extends trace invariance coverage across expanded negotiation axes.
sdks/python/oss/tests/pytest/acceptance/workflows/test_new_uri_handlers.py Updates acceptance expectations: agent handler is now SDK-registered and resolvable by URI.
sdks/python/oss/tests/pytest/acceptance/observability/test_workflow_instrument_routed.py Extends routed observability invariance tests to new transcript/control/embeds axes and 406 behavior.
sdks/python/agenta/sdk/models/workflows.py Renames historytrim, controlforce, and updates per-call flags docstring.
sdks/python/agenta/sdk/middlewares/running/resolver.py Consumes resolve and strips it from request.flags before the handler sees the request.
sdks/python/agenta/sdk/middlewares/running/normalizer.py Removes generator drain + envelope trimming; generators always stream and dict returns pass through.
sdks/python/agenta/sdk/engines/running/utils.py Registers agent_v0 in the builtin handler registry.
sdks/python/agenta/sdk/engines/running/handlers.py Updates llm_v0 to honor trim and reject force (406).
sdks/python/agenta/sdk/engines/running/errors.py Introduces ForceNotSupportedV0Error (406-mapped) for force semantics not yet supported.
sdks/python/agenta/sdk/decorators/routing.py Adds new headers, shared apply_invoke_prelude, and updates route invoke endpoint to use it.
sdks/python/agenta/sdk/agents/handler.py Adds SDK-owned agent_v0 handler with injectable composition and shared stream/batch helpers.
sdks/python/agenta/sdk/agents/fold.py Adds pure fold + trim primitives to define batch output as a fold of the event stream.
docs/designs/invoke-negotiations/tasks.md Captures the implementation task plan and verification harness for the negotiation redesign.
docs/designs/invoke-negotiations/specs.md Defines the new negotiation contract, semantics, and 4-level testing strategy.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +194 to +210
_accept = _parse_accept(req)
_flags = dict(request.flags or {})
if "stream" not in _flags:
_flags["stream"] = _accept in STREAM_MEDIA_TYPES
if "trim" not in _flags:
_trim = _parse_transcript_header(req)
if _trim is not None:
_flags["trim"] = _trim
if "force" not in _flags:
_force = _parse_session_control_header(req)
if _force is not None:
_flags["force"] = _force
if "resolve" not in _flags:
_resolve = _parse_workflow_embeds_header(req)
if _resolve is not None:
_flags["resolve"] = _resolve
request.flags = _flags
mmabrouk added a commit that referenced this pull request Jul 4, 2026
- handler.py: permission_default (the SDK deleted permission_policy in phase 3)
- fold(): the terminal result's stop_reason wins over the done event (the live runner
  emits done without stopReason, so a real pause would otherwise drop its envelope
  metadata) and pending_interaction carries a derived top-level tool name
- sandbox_agent.ts: fold upstream keepers lost to ours-wins conflict resolution
  (shared apiBase module, claude-model + resolved-model log lines)
- service pause test pinned to the realistic stream shape (done event with no stopReason)
mmabrouk added a commit that referenced this pull request Jul 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Backend enhancement New feature or request refactoring A code change that neither fixes a bug nor adds a feature SDK size:XXL This PR changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants