feat(sdk): handler-owned invoke negotiation (stream/trim/force) with batch=fold(stream)#5064
Conversation
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…t, agent_v0 SDK handler, dual-surface parity Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Plus Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Pull request overview
This PR consolidates /invoke negotiation semantics by moving command-flag resolution (stream, trim, force) into handlers, making batch output a deterministic fold of the canonical event stream, and ensuring both invoke entrypoints (route mounts and root dispatch) apply identical pre-processing and negotiation.
Changes:
- Centralizes HTTP header → flag “sugar” (body-wins) + session-id extraction + Vercel input projection into
apply_invoke_prelude, used by both route mounts and the rootPOST /invokedispatch. - Introduces
fold(events)+trim_to_trailing_unit(messages)so batch = fold(stream) by construction; updates agent and llm handlers to honortrimand rejectforcewith 406. - Removes normalizer-side generator draining and messages-envelope trimming; stream/batch/trim shape is now handler-owned, with unsupported combinations resulting in 406.
Reviewed changes
Copilot reviewed 32 out of 32 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| services/oss/tests/pytest/unit/agent/test_invoke_handler.py | Expands agent handler unit coverage to the stream/trim/force cube and updates expectations to folded-turn semantics. |
| services/oss/tests/pytest/unit/agent/conftest.py | Updates fake session streaming fixture to emit raw event turns (or synthetic message fallback). |
| services/oss/src/agent/schemas.py | Updates schema commentary to reflect trim naming. |
| services/oss/src/agent/app.py | Shrinks service handler to composition + delegation to SDK agent handler helpers; enforces force→406. |
| services/entrypoints/main.py | Applies shared invoke prelude in root dispatch surface for negotiation parity. |
| sdks/python/oss/tests/pytest/unit/test_workflow_request_flags_running.py | Renames request flag fields to trim/force and updates expectations. |
| sdks/python/oss/tests/pytest/unit/test_workflow_negotiation_cube_routing.py | Updates routing cube to enforce 406 symmetry (no courtesy aggregation) and new header names/axes. |
| sdks/python/oss/tests/pytest/unit/test_workflow_history_running.py | Removes obsolete history/aggregation tests tied to prior normalizer behavior. |
| sdks/python/oss/tests/pytest/unit/test_workflow_control_running.py | Updates documentation in tests to reference force terminology. |
| sdks/python/oss/tests/pytest/unit/test_workflow_aggregation_running.py | Reworks programmatic invoke tests to assert normalizer passthrough + resolve stripping. |
| sdks/python/oss/tests/pytest/unit/test_returned_generator_route.py | Updates returned-generator routing tests to 406 behavior for batch Accept against stream-only handlers. |
| sdks/python/oss/tests/pytest/unit/test_llm_v0_handler_flags_running.py | Adds direct handler cube tests for llm_v0 + verifies chat/completion flag-blind behavior. |
| sdks/python/oss/tests/pytest/unit/test_invoke_route_aggregation_routing.py | Flips previous aggregation assertions to 406 symmetry and updates header axes. |
| sdks/python/oss/tests/pytest/unit/test_invoke_real_handlers_negotiation_routing.py | Adds route-level negotiation cube over real agent/llm mounts and 406 force matrix. |
| sdks/python/oss/tests/pytest/unit/test_invoke_header_semantics_routing.py | Adds per-axis header semantics sweep validating body-wins and lenient unknown values. |
| sdks/python/oss/tests/pytest/unit/test_invoke_dispatch_parity_routing.py | Adds parity tests ensuring route mounts and root dispatch behave identically for the same requests/headers. |
| sdks/python/oss/tests/pytest/unit/test_batch_fold_stream_contract_routing.py | Pins the hard contract: fold(streamed events) deep-equals batch outputs over real agent handler. |
| sdks/python/oss/tests/pytest/unit/agents/test_fold.py | Adds pure unit tests for fold + trailing-unit trimming across event scenarios. |
| sdks/python/oss/tests/pytest/integration/observability/test_workflow_instrument_programmatic.py | Extends trace invariance coverage across expanded negotiation axes. |
| sdks/python/oss/tests/pytest/acceptance/workflows/test_new_uri_handlers.py | Updates acceptance expectations: agent handler is now SDK-registered and resolvable by URI. |
| sdks/python/oss/tests/pytest/acceptance/observability/test_workflow_instrument_routed.py | Extends routed observability invariance tests to new transcript/control/embeds axes and 406 behavior. |
| sdks/python/agenta/sdk/models/workflows.py | Renames history→trim, control→force, and updates per-call flags docstring. |
| sdks/python/agenta/sdk/middlewares/running/resolver.py | Consumes resolve and strips it from request.flags before the handler sees the request. |
| sdks/python/agenta/sdk/middlewares/running/normalizer.py | Removes generator drain + envelope trimming; generators always stream and dict returns pass through. |
| sdks/python/agenta/sdk/engines/running/utils.py | Registers agent_v0 in the builtin handler registry. |
| sdks/python/agenta/sdk/engines/running/handlers.py | Updates llm_v0 to honor trim and reject force (406). |
| sdks/python/agenta/sdk/engines/running/errors.py | Introduces ForceNotSupportedV0Error (406-mapped) for force semantics not yet supported. |
| sdks/python/agenta/sdk/decorators/routing.py | Adds new headers, shared apply_invoke_prelude, and updates route invoke endpoint to use it. |
| sdks/python/agenta/sdk/agents/handler.py | Adds SDK-owned agent_v0 handler with injectable composition and shared stream/batch helpers. |
| sdks/python/agenta/sdk/agents/fold.py | Adds pure fold + trim primitives to define batch output as a fold of the event stream. |
| docs/designs/invoke-negotiations/tasks.md | Captures the implementation task plan and verification harness for the negotiation redesign. |
| docs/designs/invoke-negotiations/specs.md | Defines the new negotiation contract, semantics, and 4-level testing strategy. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| _accept = _parse_accept(req) | ||
| _flags = dict(request.flags or {}) | ||
| if "stream" not in _flags: | ||
| _flags["stream"] = _accept in STREAM_MEDIA_TYPES | ||
| if "trim" not in _flags: | ||
| _trim = _parse_transcript_header(req) | ||
| if _trim is not None: | ||
| _flags["trim"] = _trim | ||
| if "force" not in _flags: | ||
| _force = _parse_session_control_header(req) | ||
| if _force is not None: | ||
| _flags["force"] = _force | ||
| if "resolve" not in _flags: | ||
| _resolve = _parse_workflow_embeds_header(req) | ||
| if _resolve is not None: | ||
| _flags["resolve"] = _resolve | ||
| request.flags = _flags |
- handler.py: permission_default (the SDK deleted permission_policy in phase 3) - fold(): the terminal result's stop_reason wins over the done event (the live runner emits done without stopReason, so a real pause would otherwise drop its envelope metadata) and pending_interaction carries a derived top-level tool name - sandbox_agent.ts: fold upstream keepers lost to ours-wins conflict resolution (shared apiBase module, claude-model + resolved-model log lines) - service pause test pinned to the realistic stream shape (done event with no stopReason)
…ract into the test_run design
Context
/invokenegotiatedstream,history, andcontrolin four different places, and each place disagreed about what the flags meant. The agent's batch response was a hand-built single message ({"messages": [{"role": "assistant", "content": result.output}]}), while its stream response carried the full turn (tool calls, tool results, the real assistant text); accumulating the stream did not reconstruct the batch response. The normalizer middleware separately drained generic generators onstream=falseand trimmed the result as raw events, so "last" meant the finaldoneevent on that path and the final message on the agent's own path.llm_v0documented a full-message-history contract but the shared normalizer silently trimmed it to one message anyway. Two invoke surfaces existed (route()-mounted apps and the generic rootPOST /invokedispatch) and only one of them did any of this negotiation, so the same request could get a batch response through one surface and an ndjson stream through the other.Root cause: negotiation was split between routing, the normalizer, and the handler, with no single owner. Full design writeup:
docs/designs/invoke-negotiations/specs.md.Changes
The rule: boolean command flags (
stream,trim,force) are resolved inside the handler and never change outside it. Routing keeps exactly two jobs: header-to-flag sugar (an explicit body flag always wins) and the one HTTP-only value negotiation,format. The running middleware keeps exactly one flag,resolve, which it consumes and strips before the handler ever sees the request. Anything a handler can't deliver is a 406, with no courtesy aggregation: batch asked of a stream-only handler 406s, stream asked of a batch-only handler 406s,force406s until take-over semantics exist.Renames (no dual-support window; the only caller of the old names was the playground, verified it never sent them): flag
historytotrim, flagcontroltoforce, headerx-ag-messages-historytox-ag-messages-transcript. Two new headers name their opt-in explicitly:x-ag-session-control: forceandx-ag-workflow-embeds: resolve(absent means null, which for every flag exceptresolvemeans false;resolve's null default stays true to preserve today's hydrate-by-default behavior, with an explicitflags: {resolve: false}body-only off-switch).batch = fold(stream), by construction. A new pure function,
fold(events) -> {messages, stop_reason, pending_interaction}(sdks/python/agenta/sdk/agents/fold.py), turns the canonical agenta event vocabulary into the real turn: assistant text from message events, tool turns fromtool_call/tool_resultpairs, in order. A siblingtrim_to_trailing_unit(messages)returns just the trailing unit (the last assistant message, or the full trailing tool/approval run) when a caller asks fortrim=true. Both invoke shapes now consume the same live event stream, so folding it client-side reproduces the batch envelope exactly, pinned by a hard route-level contract test (stream a request, batch the same request, assertfold(streamed events)deep-equals the batch output).Before (agent batch response, any tool call is invisible):
After (the real turn, trimmable per-call):
agent_v0moves into the SDK. It was the one builtin without a named SDK handler:agenta:builtin:agent:v0had a catalog entry and an interface but no entry inHANDLER_REGISTRY, so any SDK process other than the agent service resolving the URI got nothing.sdks/python/agenta/sdk/agents/handler.pynow owns the whole flag contract (stream/batch pre-branch, fold, trim,forceto 406) behind an injectableAgentCompositionseam (tool/MCP resolvers, secret provider, default template, backend selector) that defaults to env-driven behavior.services/oss/src/agent/app.pyshrinks to service-specific composition and mount; the synthetic single-message envelope and the old_agent_batch/_agent_event_streambodies are gone from the service.llm_v0honors its own documented contract. It now readstrimoff its own request (default full, matching its docstring) instead of getting silently trimmed by the shared normalizer, and shares the sameforceto 406 mapping as the agent.Both invoke surfaces negotiate identically. The header-to-flag fills, session-id extraction, and vercel input projection that used to live only in
route()'s endpoint prelude are extracted into one shared helper, called from both the route mount andservices/entrypoints/main.py's rootPOST /invokedispatch. A parity test drives the same request and headers against both surfaces and asserts an identical response.Removed: the normalizer's
stream=falsedrain branch and its{messages: [...]}envelope trim. Generators always pass through as stream responses; direct returns pass through unmodified. The handler owns its output shape end to end; a 406 at routing covers a batch-asked stream-only handler (a conscious reversion of the old drain-and-aggregate courtesy for custom user workflows with norequestparam).Tests / notes
Four test levels, matching where the contract crosses a layer boundary (specs.md "Testing contract"):
stream x trim x forcecube for_agentandllm_v0, called as plain functions.@workflowprogrammatic: passthrough +resolve-strip assertions, both for a request-taking handler and a flag-blind one.@instrumentinvariance: the span tree and accumulated trace output are identical across every flag/header combination the handler can satisfy; negotiations change the response, never the trace.@route: header-semantics sweep per axis with body-wins precedence, a negotiation cube against the real/agent/v0/invokeand/llm/v0/invokemounts, the full 406 matrix, and the dispatch-parity test.Full harness green against the local EE dev stack (freshly redeployed with the branch):
py-run-tests --sdk -aiu: 1546 unit / 136 integration / 124 acceptance passed.py-run-tests --api -aiu: 1430 unit / 775 integration+acceptance passed.py-run-tests --services -aiu: 73 unit / 15 integration / 158 acceptance passed.ts-run-tests --runner -aiu: 394 / 8 / 16 passed.ts-run-tests --web -iu: passed.Three acceptance tests initially failed because they pinned the exact pre-existing behavior this PR intentionally changes (agent_v0 unregistered, courtesy aggregation on a flag-blind generator, a stream-vs-batch span-output cube that conflated two structurally different output shapes into one global invariant). Fixed as test-side assertions only, no source behavior changed.
Web sends only
x-ag-messages-format: verceland never sent any of the renamed/removed headers; confirmed no web code assumes the old single-message batch envelope (AgentChatTransport.tsalready picks the trailing assistant message out of a multi-message array, notmessages[0]). No web changes needed.🤖 Generated with Claude Code