[Bug]: handle tool calls in analysis channel #28139

dr75 · 2025-11-05T16:37:27Z

Purpose

The gpt-oss streaming parser does not handle tool calls in the analysis channel. However, those calls can happen as explained here.

This especially happens when harmony renders the recipient in previous tool calls ~~in the analysis channel~~ before the channel token, ~~convincing~~ "confusing" the model which returns tool calls in the analysis channel. However, the harmony parser is built to handle both cases, such that the vLLM parser should also support both.

The fix proposed in this PR aims at changing the message rendering in harmony (for converting the history to tokens) to avoid confusing the model. While this fixes the issue it seems to not address the root cause: even with such a change, the model may emit tool calls in the analysis channel as described in the OpenAI docs.

To solve this, I propose to handle also tool calls in the analysis channel.

Test Plan

Manually tested with repeated tool calls, which are causing issues.
Also tested with the script provided in [Bug] Fix gpt-oss missing tool content #24954.

TODO: Added an automated test using an actual request that reliably fails without the fix but this requires a large context window and I assume that might be not acceptable for running those tests. Not sure how to handle this. Reducing the request size doesn't work. Ideally this would be unit tested, but this seems quite some effort given the current structure of the code.

Signed-off-by: Marko Rosenmueller <5467316+dr75@users.noreply.github.com>

gemini-code-assist

Code Review

This pull request addresses a bug where tool calls in the analysis channel were not being handled by the gpt-oss streaming parser. The fix correctly extends the tool call handling logic to include the analysis channel, in addition to the commentary channel. This is a robust solution that aligns with the OpenAI Harmony documentation. The addition of a complex streaming test case that reproduces the issue is a great way to ensure the fix is effective. However, I have one concern regarding the resource requirements of the new test, which I've detailed in a specific comment.

gemini-code-assist · 2025-11-05T16:39:23Z

tests/entrypoints/openai/test_serving_chat.py

        "--enforce-eager",
        "--max-model-len",
-        "4096",
+        "40960",


Increasing max-model-len to 40960 is a significant 10x increase. As you noted in the PR description, this large context window might not be acceptable for CI environments as it could lead to out-of-memory errors or significantly slow down test execution, potentially impacting CI stability. Have you considered mocking the model response or creating a more minimal, targeted unit test to verify this logic without requiring such a large context? This would make the test more robust and less resource-intensive.

gemini-code-assist

Code Review

This pull request addresses a bug where tool calls in the analysis channel for gpt-oss models were not being handled correctly by the streaming parser. The proposed solution correctly extends the existing tool call parsing logic for the commentary channel to also include the analysis channel. The change in vllm/entrypoints/openai/serving_chat.py is well-targeted and correctly re-prioritizes the logic to check for tool calls in the analysis channel before treating it as reasoning content. A new complex streaming test case has been added in tests/entrypoints/openai/test_serving_chat.py, which effectively validates the fix. The changes are sound and I have not identified any issues of high or critical severity.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2025-11-05T16:40:57Z

tests/entrypoints/openai/test_serving_chat.py

+@pytest.mark.asyncio
+async def test_gpt_oss_chat_tool_call_streaming_complex(gptoss_client: OpenAI):
+    file_path = os.path.join(os.path.dirname(__file__), "tools_gpt-oss.json")
+    with open(file_path) as f:
+        request_body = json.load(f)
+
+    request_body["model"] = GPT_OSS_MODEL_NAME
+    request_body["extra_body"] = {"reasoning_effort": "low"}
+
+    stream = await gptoss_client.chat.completions.create(**request_body)
+
+    name = None
+    args_buf = ""
+    content_buf = ""
+    async for chunk in stream:
+        delta = chunk.choices[0].delta
+        if delta.tool_calls:
+            tc = delta.tool_calls[0]
+            if tc.function and tc.function.name:
+                name = tc.function.name
+            if tc.function and tc.function.arguments:
+                args_buf += tc.function.arguments
+        if getattr(delta, "content", None):
+            content_buf += delta.content
+
+    assert name is not None
+    assert len(args_buf) > 0


Complex streaming test ignores tool-parser parametrization

The newly added test_gpt_oss_chat_tool_call_streaming_complex uses the gptoss_client fixture, which is parametrized over with_tool_parser and therefore runs both with and without the --tool-call-parser flag. The test, however, always sends a request containing tools and unconditionally asserts that a tool name and arguments are returned. When the fixture runs with with_tool_parser=False, the server is started without tool parsing support, so the request will either fail or stream plain text instead of tool deltas, causing this test to fail. The test should either accept the with_tool_parser fixture and skip/assert appropriately when it is False, or drop the parametrization for this case.

Useful? React with 👍 / 👎.

alecsolder · 2025-11-05T18:25:55Z

Hrm I feel like the issue here is actually that we are putting these tool calls on the analysis channel in the first place. Do you know why that happens in the first place vs it always being on the commentary channel? I couldn't figure it out from a quick look at the code

dr75 · 2025-11-05T19:36:34Z

See the OpenAI docs and the limited docs and comments in the harmony repo. The problem is that you cannot force the model to put the recipient where it does, except you would use some guided decoding. For that reason, harmony allows for both.

While usually it appears on the commentary channel, rendering previous tool calls to the analysis channel (which is the choice the harmony lib made - see comments) causes the model to switch as well. But in any case it wouldn't be a solution for every case as internal tools seem to be emitted on the analysis channel (see open ai docs).

As long as you don't get a different answer from OpenAI on the respective Harmony PR, this fix here seems to be the right solution to me. And I guess other frameworks are doing it the same way. But I didn't check.

alecsolder · 2025-11-05T21:50:56Z

BTW I completely agree this is a problem in general, I just want to make sure we solve it the right way.

While usually it appears on the commentary channel, rendering previous tool calls to the analysis channel (which is the choice the harmony lib made - see comments) causes the model to switch as well.

Hrm I'm a bit confused, on the input path, the channel a tool call is rendered onto is decided by our code, which should always be commentary ATM.

The more nefarious thing here is that any message to the analysis channel that comes before a message to the final channel is actually completely dropped by the harmony renderer.

I think for now the most reasonable way to do it is that on the input path, if the tool name starts with functions then it should be on the commentary channel, if it doesn't then it should go onto the analysis channel.

I think if recipient is set on the harmony message on the output path, it is fair to treat it as a tool call no matter what, doesn't matter which channel it is on for sure though

dr75 · 2025-11-06T07:16:52Z

BTW I completely agree this is a problem in general, I just want to make sure we solve it the right way.

While usually it appears on the commentary channel, rendering previous tool calls to the analysis channel (which is the choice the harmony lib made - see comments) causes the model to switch as well.

Hrm I'm a bit confused, on the input path, the channel a tool call is rendered onto is decided by our code, which should always be commentary ATM.

You are right and my description was not really correct mixing the rendering with the output. The rendering "issue" (not sure if it's really an issue) is that harmony renders the recipient before the channel token. So when we use a commentary message, it will first render the recipient and then the channel token:

<|start|>assistant to=functions.lookup_weather<|channel|>commentary

This seems what leads to the model switching to output tools in the analysis channel (well I am speculating here, so maybe that's not right) and what this PR is trying to solve.

However, even if we could change that, tools may still appear in any channel and we would want to handle them in any case. So it doesn't affect this PR I would say and I am only attempting to explain what might be happening here.

Also, I don't know if such a change to the rendering is a good idea as it may affect model performance (assuming harmony was used during training; maybe negligible impact maybe not) and prefix caching (if the model constantly outputs x and harmony renders y then the cache is gone from that location). So a decision whether to change it is more complex and should take that into account.

The more nefarious thing here is that any message to the analysis channel that comes before a message to the final channel is actually completely dropped by the harmony renderer.

Didn't know that, but I think should_drop_analysis is important here. Would have to take a closer look.

I think for now the most reasonable way to do it is that on the input path, if the tool name starts with functions then it should be on the commentary channel, if it doesn't then it should go onto the analysis channel.

Sounds reasonable to me.

I think if recipient is set on the harmony message on the output path, it is fair to treat it as a tool call no matter what, doesn't matter which channel it is on for sure though

Makes sense! Do you mean changing the solution here or keeping as is? Also see the code for non-streaming in OpenAIToolParser, which is what you describe.

Signed-off-by: Marko Rosenmueller <5467316+dr75@users.noreply.github.com>

levunet · 2025-11-06T09:12:24Z

Based on my observations from conducting various tests, although I'm not certain, I believe that built-in tool calls and function tool calls are distinguished by the tool invocations written before and after 'channel'.

For example, the structure '<|start|>assistant to=' calls built-in tools, while the structure '<|channel|>commentary to=' proceeds with function tool calls. It appears that confusion occurs when function tools are provided to the gpt-oss model using the built-in tool call structure.

dr75 · 2025-11-06T09:42:52Z

Ah, thanks @levunet, that might explain things more: the built-in tools are in the analysis channel (as per docs) and usually done with tool invocation before channel as you found out. If now harmony renders all tool calls that way without distinguishing, then the model may also switch to invocation before channel and because of that then also switches to emitting them in the analysis channel.

That would mean the ideal tool rendering (for the current model) would also distinguish between the channel and place the recipient accordingly and that way keep the output from the model as is (avoiding prefix cache cut-off for that message) and avoid the model confusion.

Probably not a very import fix but could be useful. Given that it only affects tool calls, the impact on model performance might be very small or not there.

In vLLM we might then also make this distinction and generate analysis messages for the internal tools (being all commentary atm I assume).

dr75 added 2 commits November 5, 2025 08:58

fix: handle tool calls in analysis channel

5ddbef1

test

29bcc60

Signed-off-by: Marko Rosenmueller <5467316+dr75@users.noreply.github.com>

dr75 requested review from DarkLight1337, NickLucche, aarnphm, chaunceyjiang, robertgshaw2-redhat and simon-mo as code owners November 5, 2025 16:37

mergify bot added the frontend label Nov 5, 2025

gemini-code-assist bot reviewed Nov 5, 2025

View reviewed changes

dr75 mentioned this pull request Nov 5, 2025

[Bugfix]: missing partial content if openai tool calling is enabled #28122

Open

chatgpt-codex-connector bot reviewed Nov 5, 2025

View reviewed changes

fmt

7f68de3

Signed-off-by: Marko Rosenmueller <5467316+dr75@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Bug]: handle tool calls in analysis channel #28139

[Bug]: handle tool calls in analysis channel #28139

dr75 commented Nov 5, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Nov 5, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Nov 5, 2025

Uh oh!

alecsolder commented Nov 5, 2025

Uh oh!

dr75 commented Nov 5, 2025

Uh oh!

alecsolder commented Nov 5, 2025

Uh oh!

dr75 commented Nov 6, 2025 •

edited

Loading

Uh oh!

levunet commented Nov 6, 2025

Uh oh!

dr75 commented Nov 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

[Bug]: handle tool calls in analysis channel #28139

Are you sure you want to change the base?

[Bug]: handle tool calls in analysis channel #28139

Conversation

dr75 commented Nov 5, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Nov 5, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Nov 5, 2025

Choose a reason for hiding this comment

Uh oh!

alecsolder commented Nov 5, 2025

Uh oh!

dr75 commented Nov 5, 2025

Uh oh!

alecsolder commented Nov 5, 2025

Uh oh!

dr75 commented Nov 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

levunet commented Nov 6, 2025

Uh oh!

dr75 commented Nov 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

dr75 commented Nov 5, 2025 •

edited by github-actions bot

Loading

dr75 commented Nov 6, 2025 •

edited

Loading