fix(streaming): preserve tool arguments on content block transition for Ollama#25608
fix(streaming): preserve tool arguments on content block transition for Ollama#25608kiyeonjeon21 wants to merge 2 commits intoBerriAI:mainfrom
Conversation
When AnthropicStreamWrapper detects a content block type change
(e.g. text -> tool_use), it queues content_block_stop and
content_block_start but was discarding the trigger chunk's delta.
For providers like Ollama that send the complete tool call in a
single chunk, this caused all tool arguments to be lost, resulting
in empty input: {} in tool_use blocks.
Now the processed_chunk is appended to the queue when it is a
content_block_delta, preserving the input_json_delta payload.
Closes BerriAI#25605
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
Greptile SummaryThis PR fixes a bug in Confidence Score: 5/5Safe to merge — the fix is backward-compatible, all remaining findings are P2 suggestions. The core logic is correct: partial_json="" (OpenAI's first tool chunk) is falsy and skipped; partial_json="{...}" (Ollama's complete args) is truthy and queued. The change is applied symmetrically in both sync and async paths. The only open item is missing async-path tests, which is a P2 quality observation and does not affect correctness. No files require special attention.
|
| Filename | Overview |
|---|---|
| litellm/llms/anthropic/experimental_pass_through/adapters/streaming_iterator.py | Adds conditional delta-preservation logic in both sync and async content-block-transition paths; logic is correct and backward-compatible with multi-chunk providers like OpenAI. |
| tests/test_litellm/llms/anthropic/experimental_pass_through/adapters/test_streaming_iterator_tool_args.py | New mock-only test class covering the Ollama single-chunk tool-call regression; tests argument preservation, tool name in content_block_start, and SSE event ordering — but only exercises the sync (next) path. |
Sequence Diagram
sequenceDiagram
participant Ollama as Ollama (single-chunk)
participant Wrapper as AnthropicStreamWrapper
participant Client as Anthropic Client
Ollama->>Wrapper: text chunk
Wrapper->>Client: message_start
Wrapper->>Client: content_block_start {type: text, index: 0}
Wrapper->>Client: content_block_delta {text_delta, index: 0}
Ollama->>Wrapper: tool_call chunk (name + full arguments in ONE chunk)
Note over Wrapper: _should_start_new_content_block=True, index→1
Wrapper->>Client: content_block_stop {index: 0}
Wrapper->>Client: content_block_start {type: tool_use, name, index: 1}
Wrapper->>Client: content_block_delta {input_json_delta, partial_json, index: 1} NEW
Ollama->>Wrapper: finish chunk
Wrapper->>Client: content_block_stop {index: 1}
Wrapper->>Client: message_delta {stop_reason: tool_use}
Wrapper->>Client: message_stop
Reviews (2): Last reviewed commit: "fix(streaming): skip empty partial_json ..." | Re-trigger Greptile
| # Also emit the trigger chunk's delta so that providers like | ||
| # Ollama that send the complete tool call in a single chunk | ||
| # do not lose their arguments. | ||
| if processed_chunk.get("type") == "content_block_delta": | ||
| self.chunk_queue.append(processed_chunk) |
There was a problem hiding this comment.
Spurious empty
content_block_delta for standard multi-chunk providers
For providers that stream tool calls across multiple chunks (e.g., standard OpenAI), the first transition chunk typically has arguments = "" (not None) — the function name arrives first, the JSON body follows in later chunks. Because tool_calls is not None, _translate_streaming_openai_chunk_to_anthropic sets partial_json = "", meaning translate_streaming_openai_response_to_anthropic returns a content_block_delta with partial_json = "". The new condition processed_chunk.get("type") == "content_block_delta" is then True, so an extra empty delta is appended to the queue.
This is harmless for clients that just concatenate partial_json values ("" + actual_json = actual_json), but it does alter the emitted event stream for all providers, not just Ollama. A tighter guard would limit the change to chunks that actually carry non-empty arguments:
if (
processed_chunk.get("type") == "content_block_delta"
and processed_chunk.get("delta", {}).get("partial_json")
):
self.chunk_queue.append(processed_chunk)The same applies to the __anext__ path at line 332.
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
Only emit the trigger chunk's input_json_delta when partial_json is non-empty. OpenAI-style providers send arguments="" in the first tool chunk, which was producing a spurious empty content_block_delta event and breaking existing test expectations.
Summary
AnthropicStreamWrapperdetects a content block type change (text → tool_use), it was discarding the trigger chunk's delta datainput: {}in tool_use blocksprocessed_chunkis appended to the queue when it is acontent_block_delta, preservinginput_json_deltapayload__next__) and async (__anext__) pathsTest plan
TestOllamaStreamingToolArgswith 3 tests covering:content_block_startcarries correct tool nameCloses #25605