[Bug] Fix gpt-oss missing tool content #24954

levunet · 2025-09-16T09:06:25Z

Purpose

This PR requires the following PRs to be merged first: #24768 and the harmony lib PR (openai/harmony#76).

The changes include adding 'with_recipient' and the Assistant's 'analysis' content.
Without adding this content, there was an issue where the gpt-oss model had a higher probability of outputting abnormal tokens when calling tools.

Test Plan

gpt-oss_test.py
messages.txt

Run python3 gpt-oss_test.py about 10 times.

Test Result

(before)

(after applying the harmony lib changes)

gemini-code-assist

Code Review

This pull request aims to fix a bug in gpt-oss model tool calls by adding 'analysis' content and with_recipient. The changes are logical and align with the stated purpose. However, I've identified a high-severity issue where the handling of 'analysis' content is not robust for multimodal inputs, which could lead to a runtime crash. I have provided a code suggestion to address this.

gemini-code-assist · 2025-09-16T09:08:35Z

vllm/entrypoints/harmony_utils.py

The logic to extract content is not robust. The content of an assistant message can be a list of parts (e.g., for multimodal inputs), not just a string. The current implementation content = chat_msg.get("content") or "" will cause a runtime error if content is a non-empty list, as Message.from_role_and_content expects a string. The code should handle the case where content is a list by extracting the text parts, similar to how it's handled for other message types in this file.

content = chat_msg.get("content") if isinstance(content, list): # Extract text from multimodal content content = "\n".join( p.get("text", "") for p in content if isinstance(p, dict) and p.get("type") == "text") elif not isinstance(content, str): content = "" analysis_msg = Message.from_role_and_content(Role.ASSISTANT, content) analysis_msg = analysis_msg.with_channel("analysis") msgs.append(analysis_msg)

levunet · 2025-09-16T09:33:55Z

@alecsolder
In the harmony library, the ' to' value should have been encoded as id 316, but occasionally it was incorrectly encoded as a mixture of 220+935 id values, which caused the model to output incorrect tokens.

alecsolder · 2025-09-18T15:12:50Z

In your test script, I see you're using the streaming completions endpoint, which I don't think uses the harmony_utils method you modified? I just want to double check I'm reading it right

alecsolder · 2025-09-18T15:18:12Z

Also, thanks for mentioning that huggingface tokenizer change, I hadn't seen it and my snapshot was out of date!

levunet · 2025-09-19T01:17:43Z

Thank you for checking. I've double-checked the part you mentioned.
I can confirm that the parse_chat_input I modified in the harmony_utils.py file is being used without any issues through the flow: /v1/chat/completions -> def create_chat_completion -> def _make_request_with_harmony -> def parse_chat_input to parse request.messages.
The streaming endpoint is also correctly using this modified method.

fouadbakkour · 2025-09-30T14:59:39Z

When this PR will be merged guys?
we need this fix please.

youkaichao · 2025-10-06T08:39:34Z

cc @chaunceyjiang @aarnphm

The changes include adding 'with_recipient' and the Assistant's 'analysis' content. Without adding this content, there was an issue where the gpt-oss model had a higher probability of outputting abnormal tokens when calling tools. Signed-off-by: kyt <eluban4532@gmail.com>

levunet · 2025-10-10T13:23:23Z

I was delayed today due to some issues while resolving git conflicts. The issue was caused by FlashInfer 0.4.0, which was added in recent commits, causing responses to repeat infinitely in the gpt-oss model. It took quite some time to find a solution.

VLLM_USE_FLASHINFER_SAMPLER=0

The problem occurred in the sampling process, and it works normally when the FlashInfer sampler is disabled.

epark001 · 2025-10-10T14:27:29Z

I was delayed today due to some issues while resolving git conflicts. The issue was caused by FlashInfer 0.4.0, which was added in recent commits, causing responses to repeat infinitely in the gpt-oss model. It took quite some time to find a solution.

VLLM_USE_FLASHINFER_SAMPLER=0

The problem occurred in the sampling process, and it works normally when the FlashInfer sampler is disabled.

will the fix need both the VLLM_USE_FLASHINFER_SAMPLER=0 and this pr? or just the flashinfer=0?

levunet · 2025-10-10T15:15:55Z

VLLM_USE_FLASHINFER_SAMPLER=0

I think this setting is for resolving an issue that occurs in a completely different place from the current PR. After testing multiple times, it seems like the same problem exists even without the current PR, so I think this setting can be used separately.

ashgold · 2025-10-21T00:59:13Z

Any progress here?
Many people in my department want to use the tool calling feature with gpt-oss, but it seems that feature isn't fully implemented in vLLM yet.

levunet · 2025-10-21T01:08:11Z

Unfortunately, the current changes alone are insufficient to fully resolve the bug - the changes from the harmony repo are essential. Since OpenAI hasn't reviewed that PR yet, should I propose pip installing from my fork branch in the meantime?..

levunet · 2025-10-21T01:19:43Z

@aarnphm @chaunceyjiang

This PR fixes a bug that occurred when using tool calling with gpt-oss models. However, it requires additional changes from the openai/harmony repo, which unfortunately hasn't been reviewed for 1 months now. Would it be acceptable to temporarily modify the requirements to install from my forked harmony branch until the upstream PR is merged?

ashgold · 2025-10-23T02:09:23Z

@aarnphm @chaunceyjiang

This PR fixes a bug that occurred when using tool calling with gpt-oss models. However, it requires additional changes from the openai/harmony repo, which unfortunately hasn't been reviewed for 1 months now. Would it be acceptable to temporarily modify the requirements to install from my forked harmony branch until the upstream PR is merged?

@levunet
Could you provide guidance on how to build using the modified version of the openai-harmony module?
I'd like to build the vLLM image with this PR and the modified openai-harmony module, then test the tool calling functionality.
I have experience building the vLLM v0.11.0 image in my environment.

dr75 · 2025-10-31T11:40:35Z

@levunet, could you give an example where tool calling does not work in v0.11.1? I couldn't find any so I wonder if this fix and the harmony fix is actually needed.

Basically #24768 makes tool calling work for me for chat completions in streaming mode.

However, there are two remaining issues:

using --tool-call-parser openai breaks normal non-streaming chat completions; so this flag must not be used right now (tool calling works without it)
tool parsing for non-streaming requests doesn't work in v0.11.0. This also doesn't work in v0.11.1 nor with the fix here and the harmony fix you mentioned.

dr75 · 2025-10-31T13:29:41Z

Ok, could reproduce the cases now with your script above and confirm that it still happens on v0.11.1 and is fixed with this PR and the harmony change.

levunet requested review from aarnphm and chaunceyjiang as code owners September 16, 2025 09:06

mergify bot added frontend gpt-oss Related to GPT-OSS models labels Sep 16, 2025

github-project-automation bot added this to gpt-oss Issues & Enhancements Sep 16, 2025

github-project-automation bot moved this to To Triage in gpt-oss Issues & Enhancements Sep 16, 2025

gemini-code-assist bot reviewed Sep 16, 2025

View reviewed changes

levunet force-pushed the feat/gptoss-tool-fix branch 2 times, most recently from c3780bf to eed8815 Compare September 16, 2025 09:19

levunet force-pushed the feat/gptoss-tool-fix branch from eed8815 to 39d6e8e Compare September 16, 2025 14:13

This was referenced Sep 17, 2025

[Bug]: gpt-oss Intermittent 500 Internal Server Error with empty response body when using strict JSON “function router” system prompt #23837

Open

[Feature][Responses API] Stream Function Call #23222

Open

This was referenced Sep 26, 2025

[openai] Fix missing tool usage check (system message) #24768

Merged

fix: Resolve response format corruption due to incorrect encoding openai/harmony#76

Open

levunet force-pushed the feat/gptoss-tool-fix branch from 39d6e8e to e8e7579 Compare October 10, 2025 13:12

levunet force-pushed the feat/gptoss-tool-fix branch from e8e7579 to f13f45b Compare October 10, 2025 13:19

bbrowning mentioned this pull request Oct 10, 2025

Bump Flashinfer to v0.4.0 #26326

Merged

5 tasks

dr75 mentioned this pull request Nov 5, 2025

[Bug]: handle tool calls in analysis channel #28139

Open

alecsolder mentioned this pull request Nov 5, 2025

[Frontend] [gpt-oss] Chat format GD for tool calling with gptoss #28148

Open

5 tasks

Uh oh!

[Bug] Fix gpt-oss missing tool content #24954

Are you sure you want to change the base?

[Bug] Fix gpt-oss missing tool content #24954

Conversation

levunet commented Sep 16, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Sep 16, 2025

Choose a reason for hiding this comment

Uh oh!

levunet commented Sep 16, 2025

Uh oh!

alecsolder commented Sep 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alecsolder commented Sep 18, 2025

Uh oh!

levunet commented Sep 19, 2025

Uh oh!

fouadbakkour commented Sep 30, 2025

Uh oh!

youkaichao commented Oct 6, 2025

Uh oh!

levunet commented Oct 10, 2025

Uh oh!

epark001 commented Oct 10, 2025

Uh oh!

levunet commented Oct 10, 2025

Uh oh!

ashgold commented Oct 21, 2025

Uh oh!

levunet commented Oct 21, 2025

Uh oh!

levunet commented Oct 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ashgold commented Oct 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dr75 commented Oct 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dr75 commented Oct 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

levunet commented Sep 16, 2025 •

edited by github-actions bot

Loading

alecsolder commented Sep 18, 2025 •

edited

Loading

levunet commented Oct 21, 2025 •

edited

Loading

ashgold commented Oct 23, 2025 •

edited

Loading

dr75 commented Oct 31, 2025 •

edited

Loading

dr75 commented Oct 31, 2025 •

edited

Loading