fix: GPT Streams with tool calls #214

DZunke · 2025-02-16T17:50:26Z

Please not merge! The changed code was just a wild guess ... read more below.

I am currently experimenting a bit with the streamed responses from OpenAI gpt. First it worked like a charm but then i wanted to utilize my set of functions. Every request that should utilize those functions is not working anymore. I remembered there was the PR #126 that should have fixed it already. But this seem not to work.

So i built the example examples/stream-tools-gpt-openai.php and found out it is not only not working for me but also not working for the library. Do you have any remembering on this topic @chr-hertel ?

When i request a streamed response without a function call the first chunk looks like this

array:7 [
  // ...
  "choices" => array:1 [
    0 => array:4 [
      "index" => 0
      "delta" => array:3 [
        "role" => "assistant"
        "content" => ""
        "refusal" => null
      ]
      "logprobs" => null
      "finish_reason" => null
    ]
  ]
]

When i do a request with a streamed response that should include a function call the OpenAI API delivers the following first chunk.

array:7 [
  // ...
  "choices" => array:1 [
    0 => array:4 [
      "index" => 0
      "delta" => array:2 [
        "role" => "assistant"
        "content" => null
      ]
      "logprobs" => null
      "finish_reason" => null
    ]
  ]
]

Within the file src/Bridge/OpenAI/GPT/ResponseConverter.php the check for the function calls currently depends on having a tool_calls key within the first chunks delta but it is not there. I had a wild guess and found that it works fine with having a check for the null content node in the first chunk as the following chunks do contain all the information about tool calling. But this feels pretty unstable and wrong.

Google is currently not really helpful as a lot of people seem to have problems with the streaming of function calls at OpenAI. So, while searching for more information, i thought i throw it to the room. Are there people out here having the same struggle or already found solutions?

chr-hertel · 2025-02-16T18:16:35Z

Oh, good one, looks like that happened with #140.

So I think your basic idea is right. In streamIsToolCall we need to loop through the generator until we can tell if there is content or tool_calls coming before we decide if we buffer for converting into tool call or return the stream as generator.

if you're not happy with this solution you could try to loop through the iterator until content or tool_calls isset and rewind afterwards if needed? 🤔

haven't tried it though

chr-hertel · 2025-02-16T18:19:43Z

not sure if there is a way to move that to ResponseConverter::streamResponse since both conversion methods already deal with those !isset situations themselves in ResponseConverter::convertStreamToToolCalls:109 and ResponseConverter::convertStreamContent:134

DZunke · 2025-02-16T18:26:42Z

Yeah. Thanks for your input. I am currently digging into some Python Code that is given in the OpenAI discussion rounds. There was another example that it could be possible that there is "text" like "Hey, i am going to search through some functions ... wait a moment" and then function calls are starting but i do not get such stuff with the same prompts currently.

Beside the documentation is definitively looking wrong here the streaming with functions seem in general a bit more complicated when the stuff people ping pong in the community is right.

But it seems not uncommon to have a combined streaming function and fallback to function call when they appear in the response instead of checking them separatly in the beginning. So integrating it into ResponseConverter::streamResponse seem currently be a step to solve this that feels not wrong and more stable.

DZunke · 2025-02-17T16:25:17Z

Hey @chr-hertel ... could you have a look to the changes if it fits the idea? And maybe you have also the possibility to execute the associated examples that are not utilizing OpenAI API. Those are running as expected.

I have rewritten the new example to execute the function call not with the beginning of the stream but in the middle of it - as i found it as a use case. This is why i have added a tool chain specific stream response that is catching the tool calls from the response converter. The initial stream ends with the appearance of an function call and is replace with the function call response stream afterwards.

This is why i have implemented the replacement of the defaul stream response with a toolbox specific to be able to catch the function call in the middle of the stream. I put it to the toolbox namespace as the chain processor which is replacing it is also there.

Looks pretty stable currently and looks better then the current solution that was only looking to the initial chunk of the stream, i what do you think? Would this a way to move on?

DZunke · 2025-02-17T16:34:11Z

Ok. One Problem i have to think about is that the stream repeats itself after function call ... sometimes a bit different why i did not had it detected first. But it seem to not remember the already given information before the function call 🤔

chr-hertel · 2025-02-17T21:54:58Z

oh wow, wasn't aware of that mixed response of first text and then toolcall - pretty nice - increases the complexity quite brutal but looks like you covered it 👍

the only thing from a functional point that i'm catching is that examples/toolbox-weather-event.php dies in a loop - so too many function calls until we get a 429 Too Many Requests from open-meteo.com

DZunke · 2025-02-18T06:37:35Z

oh wow, wasn't aware of that mixed response of first text and then toolcall - pretty nice - increases the complexity quite brutal but looks like you covered it 👍

the only thing from a functional point that i'm catching is that examples/toolbox-weather-event.php dies in a loop - so too many function calls until we get a 429 Too Many Requests from open-meteo.com

Thanks! I think this is the same issue that i have with double answering my example question for unicorns. I still try to solve this issue that the tool call response to the LLM answers the containing question that was already answered again. Sounds like this is then the same issue with trying to answering the weather question again and again and again even it is answered. So it does not detect that it has already answered this before the tool call appeared.

Could have something to do with the copy of the initial message bag to the tool call response. Maybe it needs to get the already generated text as context for the answer.

But if it, generally, also works with other GPT providers and the changed code looks not too hard away from your thoughts i am already happy and move on solving this 👍🏼

DZunke · 2025-02-18T15:14:47Z

Hm! Ok. I have fixed the weather loop, this was my error when building the closure for the tool calls. Easy one.

So to fix the tool call repeating messages. The call to the tool is not aware of the already given answer, as mentioned before. After research it is "normal" as the model tries to answer the user question with the tool call. Within non streamed answers this is not a problem as the full answer is build at once after the tool call is given but in streamed context it is possible, like within my example, that the LLM already gave content before it finds out that it have to call a tool. So we pay the feeled "performance" boost of answering because the model does not know what will comes next in detail in general.

This is what i have interpreted from some OpenAI community discussions. With them i also found out that it is recommended to send the fetched assistant message, received before the tool call, with the tool answer request.

It works for now but i think on the long run with more complicated prompts there could be a problem where i did not wanted to implement a BC now. Why a BC? Each tool call only knows the streamed content since the last stream started as the messages from the initial call are cloned in src/Chain/ToolBox/ChainProcessor.php::71. This clone was there before and results in the tool calls not being within the message bag. Removing it would be a BC as the tool calls would be in there after each chain call. So developers utilizing this library would have the tool calls in their message bags and would have to filter manually if they do not want them.

I would suggest, on the long run, to have the tool calls in there, at least for streams. You already mentioned the raised complexity. Maybe on the long run, instead of a stream option we could profit from splitting up the non-stream and stream behavior in own structures? To lower the complexity again. I would not do it now if you are fine with the current state of the PR.

WDYT, @chr-hertel ?

From my side ... i would say: Done here as a first step with a lot of learnings about streams? And sorry for the tapestry of text 🙈

chr-hertel · 2025-02-18T18:50:08Z

Maybe on the long run, instead of a stream option we could profit from splitting up the non-stream and stream behavior in own structures?

yea, from time to time i struggle a bit with the approach here that i chose to bring all capabilities to one almighty chain - but on the other hand it's call llm-chain not llm-chains :D

chr-hertel

looks correct to me and functional tests are fine - also feel the urge to decouple/refactor/split but let's move that out of the scope of this PR

chr-hertel · 2025-02-18T18:56:07Z

Thanks @DZunke!

chr-hertel approved these changes Feb 16, 2025

View reviewed changes

fix: GPT Streams with tool calls

6de8d9e

DZunke force-pushed the gpt-function-calls-defect-in-stream branch from b475dd3 to 6de8d9e Compare February 17, 2025 16:19

Fix streamed tool calls are not aware of already given answers

3088c47

DZunke force-pushed the gpt-function-calls-defect-in-stream branch from 7ab1766 to 3088c47 Compare February 18, 2025 15:05

chr-hertel approved these changes Feb 18, 2025

View reviewed changes

chr-hertel merged commit 706497c into php-llm:main Feb 18, 2025
7 checks passed

DZunke mentioned this pull request Feb 20, 2025

feat(Chat): Streaming Chat Response ChronicleKeeper/ChronicleKeeper#87

Merged

12 tasks

chr-hertel added the bug Something isn't working label Mar 14, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: GPT Streams with tool calls #214

fix: GPT Streams with tool calls #214

Uh oh!

DZunke commented Feb 16, 2025 •

edited

Loading

Uh oh!

chr-hertel commented Feb 16, 2025

Uh oh!

chr-hertel commented Feb 16, 2025

Uh oh!

DZunke commented Feb 16, 2025

Uh oh!

DZunke commented Feb 17, 2025

Uh oh!

DZunke commented Feb 17, 2025

Uh oh!

chr-hertel commented Feb 17, 2025

Uh oh!

DZunke commented Feb 18, 2025 •

edited

Loading

Uh oh!

DZunke commented Feb 18, 2025 •

edited

Loading

Uh oh!

chr-hertel commented Feb 18, 2025

Uh oh!

chr-hertel left a comment

Uh oh!

Uh oh!

chr-hertel commented Feb 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fix: GPT Streams with tool calls #214

fix: GPT Streams with tool calls #214

Uh oh!

Conversation

DZunke commented Feb 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chr-hertel commented Feb 16, 2025

Uh oh!

chr-hertel commented Feb 16, 2025

Uh oh!

DZunke commented Feb 16, 2025

Uh oh!

DZunke commented Feb 17, 2025

Uh oh!

DZunke commented Feb 17, 2025

Uh oh!

chr-hertel commented Feb 17, 2025

Uh oh!

DZunke commented Feb 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DZunke commented Feb 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chr-hertel commented Feb 18, 2025

Uh oh!

chr-hertel left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

chr-hertel commented Feb 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

DZunke commented Feb 16, 2025 •

edited

Loading

DZunke commented Feb 18, 2025 •

edited

Loading

DZunke commented Feb 18, 2025 •

edited

Loading