fix(streaming): preserve tool arguments on content block transition for Ollama by kiyeonjeon21 · Pull Request #25608 · BerriAI/litellm

kiyeonjeon21 · 2026-04-12T19:34:39Z

Summary

When AnthropicStreamWrapper detects a content block type change (text → tool_use), it was discarding the trigger chunk's delta data
Ollama sends the complete tool call (name + full arguments) in a single chunk, so all arguments were lost → input: {} in tool_use blocks
Now the processed_chunk is appended to the queue when it is a content_block_delta, preserving input_json_delta payload
Fix applied to both sync (__next__) and async (__anext__) paths

Test plan

Added TestOllamaStreamingToolArgs with 3 tests covering:
- Tool arguments preserved when sent in a single chunk
- content_block_start carries correct tool name
- Event ordering follows Anthropic SSE protocol
All tests pass locally
Black formatting applied

When AnthropicStreamWrapper detects a content block type change (e.g. text -> tool_use), it queues content_block_stop and content_block_start but was discarding the trigger chunk's delta. For providers like Ollama that send the complete tool call in a single chunk, this caused all tool arguments to be lost, resulting in empty input: {} in tool_use blocks. Now the processed_chunk is appended to the queue when it is a content_block_delta, preserving the input_json_delta payload. Closes BerriAI#25605

vercel · 2026-04-12T19:34:44Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
litellm	Ready	Preview, Comment	Apr 12, 2026 8:04pm

greptile-apps · 2026-04-12T19:36:57Z

Greptile Summary

This PR fixes a bug in AnthropicStreamWrapper where Ollama's single-chunk tool calls (name + full arguments in one chunk) had their input_json_delta payload silently discarded during the text → tool_use content block transition. The fix appends the trigger chunk's delta to the queue when its partial_json is non-empty, and skips it when empty (correctly handling OpenAI's arguments="" first-chunk pattern). The change is applied symmetrically to both __next__ and __anext__.

Confidence Score: 5/5

Safe to merge — the fix is backward-compatible, all remaining findings are P2 suggestions.

The core logic is correct: partial_json="" (OpenAI's first tool chunk) is falsy and skipped; partial_json="{...}" (Ollama's complete args) is truthy and queued. The change is applied symmetrically in both sync and async paths. The only open item is missing async-path tests, which is a P2 quality observation and does not affect correctness.

No files require special attention.

Important Files Changed

Filename	Overview
litellm/llms/anthropic/experimental_pass_through/adapters/streaming_iterator.py	Adds conditional delta-preservation logic in both sync and async content-block-transition paths; logic is correct and backward-compatible with multi-chunk providers like OpenAI.
tests/test_litellm/llms/anthropic/experimental_pass_through/adapters/test_streaming_iterator_tool_args.py	New mock-only test class covering the Ollama single-chunk tool-call regression; tests argument preservation, tool name in content_block_start, and SSE event ordering — but only exercises the sync (next) path.

Sequence Diagram

sequenceDiagram
    participant Ollama as Ollama (single-chunk)
    participant Wrapper as AnthropicStreamWrapper
    participant Client as Anthropic Client

    Ollama->>Wrapper: text chunk
    Wrapper->>Client: message_start
    Wrapper->>Client: content_block_start {type: text, index: 0}
    Wrapper->>Client: content_block_delta {text_delta, index: 0}

    Ollama->>Wrapper: tool_call chunk (name + full arguments in ONE chunk)
    Note over Wrapper: _should_start_new_content_block=True, index→1
    Wrapper->>Client: content_block_stop {index: 0}
    Wrapper->>Client: content_block_start {type: tool_use, name, index: 1}
    Wrapper->>Client: content_block_delta {input_json_delta, partial_json, index: 1} NEW

    Ollama->>Wrapper: finish chunk
    Wrapper->>Client: content_block_stop {index: 1}
    Wrapper->>Client: message_delta {stop_reason: tool_use}
    Wrapper->>Client: message_stop

_{Reviews (2): Last reviewed commit: "fix(streaming): skip empty partial_json ..." | Re-trigger Greptile}

codspeed-hq · 2026-04-12T19:37:01Z

Merging this PR will not alter performance

✅ 16 untouched benchmarks

_{Comparing kiyeonjeon21:fix/ollama-streaming-tool-params-dropped (47a8d4b) with main (5544803)}

greptile-apps · 2026-04-12T19:37:01Z

litellm/llms/anthropic/experimental_pass_through/adapters/streaming_iterator.py

+                    # Also emit the trigger chunk's delta so that providers like
+                    # Ollama that send the complete tool call in a single chunk
+                    # do not lose their arguments.
+                    if processed_chunk.get("type") == "content_block_delta":
+                        self.chunk_queue.append(processed_chunk)


Spurious empty content_block_delta for standard multi-chunk providers

For providers that stream tool calls across multiple chunks (e.g., standard OpenAI), the first transition chunk typically has arguments = "" (not None) — the function name arrives first, the JSON body follows in later chunks. Because tool_calls is not None, _translate_streaming_openai_chunk_to_anthropic sets partial_json = "", meaning translate_streaming_openai_response_to_anthropic returns a content_block_delta with partial_json = "". The new condition processed_chunk.get("type") == "content_block_delta" is then True, so an extra empty delta is appended to the queue.

This is harmless for clients that just concatenate partial_json values ("" + actual_json = actual_json), but it does alter the emitted event stream for all providers, not just Ollama. A tighter guard would limit the change to chunks that actually carry non-empty arguments:

if ( processed_chunk.get("type") == "content_block_delta" and processed_chunk.get("delta", {}).get("partial_json") ): self.chunk_queue.append(processed_chunk)

The same applies to the __anext__ path at line 332.

codecov · 2026-04-12T19:38:09Z

Codecov Report

❌ Patch coverage is 40.00000% with 6 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
...mental_pass_through/adapters/streaming_iterator.py	40.00%	6 Missing ⚠️

📢 Thoughts on this report? Let us know!

Only emit the trigger chunk's input_json_delta when partial_json is non-empty. OpenAI-style providers send arguments="" in the first tool chunk, which was producing a spurious empty content_block_delta event and breaking existing test expectations.

greptile-apps bot reviewed Apr 12, 2026

View reviewed changes

vercel bot deployed to Preview April 12, 2026 19:40 View deployment

vercel bot deployed to Preview April 12, 2026 20:04 View deployment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(streaming): preserve tool arguments on content block transition for Ollama#25608

fix(streaming): preserve tool arguments on content block transition for Ollama#25608
kiyeonjeon21 wants to merge 2 commits intoBerriAI:mainfrom
kiyeonjeon21:fix/ollama-streaming-tool-params-dropped

kiyeonjeon21 commented Apr 12, 2026

Uh oh!

vercel bot commented Apr 12, 2026 •

edited

Loading

Uh oh!

greptile-apps bot commented Apr 12, 2026 •

edited

Loading

Important Files Changed

Uh oh!

codspeed-hq bot commented Apr 12, 2026 •

edited

Loading

Uh oh!

greptile-apps bot Apr 12, 2026

Uh oh!

codecov bot commented Apr 12, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

kiyeonjeon21 commented Apr 12, 2026

Summary

Test plan

Uh oh!

vercel bot commented Apr 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

greptile-apps bot commented Apr 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

codspeed-hq bot commented Apr 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merging this PR will not alter performance

Uh oh!

greptile-apps bot Apr 12, 2026

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Apr 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vercel bot commented Apr 12, 2026 •

edited

Loading

greptile-apps bot commented Apr 12, 2026 •

edited

Loading

codspeed-hq bot commented Apr 12, 2026 •

edited

Loading

codecov bot commented Apr 12, 2026 •

edited

Loading