Skip to content

fix(streaming): preserve tool arguments on content block transition for Ollama#25608

Open
kiyeonjeon21 wants to merge 2 commits intoBerriAI:mainfrom
kiyeonjeon21:fix/ollama-streaming-tool-params-dropped
Open

fix(streaming): preserve tool arguments on content block transition for Ollama#25608
kiyeonjeon21 wants to merge 2 commits intoBerriAI:mainfrom
kiyeonjeon21:fix/ollama-streaming-tool-params-dropped

Conversation

@kiyeonjeon21
Copy link
Copy Markdown

Summary

  • When AnthropicStreamWrapper detects a content block type change (text → tool_use), it was discarding the trigger chunk's delta data
  • Ollama sends the complete tool call (name + full arguments) in a single chunk, so all arguments were lost → input: {} in tool_use blocks
  • Now the processed_chunk is appended to the queue when it is a content_block_delta, preserving input_json_delta payload
  • Fix applied to both sync (__next__) and async (__anext__) paths

Test plan

  • Added TestOllamaStreamingToolArgs with 3 tests covering:
    • Tool arguments preserved when sent in a single chunk
    • content_block_start carries correct tool name
    • Event ordering follows Anthropic SSE protocol
  • All tests pass locally
  • Black formatting applied

Closes #25605

When AnthropicStreamWrapper detects a content block type change
(e.g. text -> tool_use), it queues content_block_stop and
content_block_start but was discarding the trigger chunk's delta.

For providers like Ollama that send the complete tool call in a
single chunk, this caused all tool arguments to be lost, resulting
in empty input: {} in tool_use blocks.

Now the processed_chunk is appended to the queue when it is a
content_block_delta, preserving the input_json_delta payload.

Closes BerriAI#25605
@vercel
Copy link
Copy Markdown

vercel bot commented Apr 12, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
litellm Ready Ready Preview, Comment Apr 12, 2026 8:04pm

Request Review

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Apr 12, 2026

Greptile Summary

This PR fixes a bug in AnthropicStreamWrapper where Ollama's single-chunk tool calls (name + full arguments in one chunk) had their input_json_delta payload silently discarded during the text → tool_use content block transition. The fix appends the trigger chunk's delta to the queue when its partial_json is non-empty, and skips it when empty (correctly handling OpenAI's arguments="" first-chunk pattern). The change is applied symmetrically to both __next__ and __anext__.

Confidence Score: 5/5

Safe to merge — the fix is backward-compatible, all remaining findings are P2 suggestions.

The core logic is correct: partial_json="" (OpenAI's first tool chunk) is falsy and skipped; partial_json="{...}" (Ollama's complete args) is truthy and queued. The change is applied symmetrically in both sync and async paths. The only open item is missing async-path tests, which is a P2 quality observation and does not affect correctness.

No files require special attention.

Important Files Changed

Filename Overview
litellm/llms/anthropic/experimental_pass_through/adapters/streaming_iterator.py Adds conditional delta-preservation logic in both sync and async content-block-transition paths; logic is correct and backward-compatible with multi-chunk providers like OpenAI.
tests/test_litellm/llms/anthropic/experimental_pass_through/adapters/test_streaming_iterator_tool_args.py New mock-only test class covering the Ollama single-chunk tool-call regression; tests argument preservation, tool name in content_block_start, and SSE event ordering — but only exercises the sync (next) path.

Sequence Diagram

sequenceDiagram
    participant Ollama as Ollama (single-chunk)
    participant Wrapper as AnthropicStreamWrapper
    participant Client as Anthropic Client

    Ollama->>Wrapper: text chunk
    Wrapper->>Client: message_start
    Wrapper->>Client: content_block_start {type: text, index: 0}
    Wrapper->>Client: content_block_delta {text_delta, index: 0}

    Ollama->>Wrapper: tool_call chunk (name + full arguments in ONE chunk)
    Note over Wrapper: _should_start_new_content_block=True, index→1
    Wrapper->>Client: content_block_stop {index: 0}
    Wrapper->>Client: content_block_start {type: tool_use, name, index: 1}
    Wrapper->>Client: content_block_delta {input_json_delta, partial_json, index: 1} NEW

    Ollama->>Wrapper: finish chunk
    Wrapper->>Client: content_block_stop {index: 1}
    Wrapper->>Client: message_delta {stop_reason: tool_use}
    Wrapper->>Client: message_stop
Loading

Reviews (2): Last reviewed commit: "fix(streaming): skip empty partial_json ..." | Re-trigger Greptile

@codspeed-hq
Copy link
Copy Markdown
Contributor

codspeed-hq bot commented Apr 12, 2026

Merging this PR will not alter performance

✅ 16 untouched benchmarks


Comparing kiyeonjeon21:fix/ollama-streaming-tool-params-dropped (47a8d4b) with main (5544803)

Open in CodSpeed

Comment on lines +145 to +149
# Also emit the trigger chunk's delta so that providers like
# Ollama that send the complete tool call in a single chunk
# do not lose their arguments.
if processed_chunk.get("type") == "content_block_delta":
self.chunk_queue.append(processed_chunk)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Spurious empty content_block_delta for standard multi-chunk providers

For providers that stream tool calls across multiple chunks (e.g., standard OpenAI), the first transition chunk typically has arguments = "" (not None) — the function name arrives first, the JSON body follows in later chunks. Because tool_calls is not None, _translate_streaming_openai_chunk_to_anthropic sets partial_json = "", meaning translate_streaming_openai_response_to_anthropic returns a content_block_delta with partial_json = "". The new condition processed_chunk.get("type") == "content_block_delta" is then True, so an extra empty delta is appended to the queue.

This is harmless for clients that just concatenate partial_json values ("" + actual_json = actual_json), but it does alter the emitted event stream for all providers, not just Ollama. A tighter guard would limit the change to chunks that actually carry non-empty arguments:

if (
    processed_chunk.get("type") == "content_block_delta"
    and processed_chunk.get("delta", {}).get("partial_json")
):
    self.chunk_queue.append(processed_chunk)

The same applies to the __anext__ path at line 332.

@codecov
Copy link
Copy Markdown

codecov bot commented Apr 12, 2026

Codecov Report

❌ Patch coverage is 40.00000% with 6 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
...mental_pass_through/adapters/streaming_iterator.py 40.00% 6 Missing ⚠️

📢 Thoughts on this report? Let us know!

Only emit the trigger chunk's input_json_delta when partial_json is
non-empty.  OpenAI-style providers send arguments="" in the first
tool chunk, which was producing a spurious empty content_block_delta
event and breaking existing test expectations.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug: Streaming mode drops tool parameters for Ollama provider (Anthropic Messages API)

1 participant