fix: merge Gemini parallel tool call responses#4103
fix: merge Gemini parallel tool call responses#4103omChauhanDev wants to merge 2 commits intopipecat-ai:mainfrom
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests.
🚀 New features to boost your workflow:
|
|
Tagging @kompfner to take a look. |
Huh, wonder if this is a new Gemini requirement? This was almost certainly not the case when we first introduced thinking support for Gemini... |
| def _merge_parallel_tool_calls_for_thinking( | ||
| self, thought_signature_dicts: List[dict], messages: List[Content] | ||
| ) -> List[Content]: | ||
| """Merge parallel tool calls into single Content objects when thinking is enabled. |
There was a problem hiding this comment.
This docstring is a bit out of date now. Let's update it throughout to discuss the parallel-tool-call-responses-must-be-grouped requirement as well, rather than only mentioning it at the end.
| parallel group. Any tool call messages after it without a | ||
| thought_signature get merged into that group, regardless of what | ||
| messages appear in between. | ||
| parallel group. Subsequent tool call messages without a |
There was a problem hiding this comment.
We had built this algorithm initially trying to make the fewest assumptions about how our Pipecat context was structured. That meant allowing for the possibility that other kinds of messages were interleaved between any tool call messages (even though Gemini, I believe, doesn't allow that).
I think we can aim to keep that flexibility, even with the new requirement of bundling tool results together into a single message.
I think we can (and maybe have to) assume that we'll get all the tool results for tool request group A before a group B starts. But we can limit ourselves to just that assumption.
Maybe we could tweak your algorithm just a bit:
- Loop through the messages (no change)
- If we hit a tool call with a thought signature, start a group (no change)
- Scan forward (no change), and...
- If you encounter a tool call with no thought signature, add it to
merged_parts(no change) - If you hit a tool response, add it to
merged_response_parts(no change) - If you hit any other non-tool-call or non-tool-response message, add it to
other_messagesand keep scanning (changed; similar to previous code)
- If you encounter a tool call with no thought signature, add it to
- At the end of the group, output
merged_parts, thenmerged_response_parts, then other messages (changed; similar to previous code)
This is mostly academic/defensive—it's sort of "just in case" our context has other stuff interleaved between tool calls and responses (even though, again, I don't think Gemini supports that).
Please describe the changes in your PR. If it is addressing an issue, please reference that as well.
Fixes #3992
Issue :
When thinking is enabled,
_merge_parallel_tool_calls_for_thinking()merges modelfunction_callmessages into a single turn but left the corresponding userfunction_responsemessages split - Gemini rejects this with a 400Change :
The inner scan loop now also collects & merges
function_responseparts, & stops at any non-tool message boundary instead of scanning past it. Also, added tests covering parallel, sequential, text boundary, batch ordering, & mixed parallel+sequential cases