-
-
Notifications
You must be signed in to change notification settings - Fork 11.1k
[Bug] Fix gpt-oss missing tool content #24954
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request aims to fix a bug in gpt-oss model tool calls by adding 'analysis' content and with_recipient. The changes are logical and align with the stated purpose. However, I've identified a high-severity issue where the handling of 'analysis' content is not robust for multimodal inputs, which could lead to a runtime crash. I have provided a code suggestion to address this.
vllm/entrypoints/harmony_utils.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic to extract content is not robust. The content of an assistant message can be a list of parts (e.g., for multimodal inputs), not just a string. The current implementation content = chat_msg.get("content") or "" will cause a runtime error if content is a non-empty list, as Message.from_role_and_content expects a string. The code should handle the case where content is a list by extracting the text parts, similar to how it's handled for other message types in this file.
content = chat_msg.get("content")
if isinstance(content, list):
# Extract text from multimodal content
content = "\n".join(
p.get("text", "") for p in content
if isinstance(p, dict) and p.get("type") == "text")
elif not isinstance(content, str):
content = ""
analysis_msg = Message.from_role_and_content(Role.ASSISTANT, content)
analysis_msg = analysis_msg.with_channel("analysis")
msgs.append(analysis_msg)c3780bf to
eed8815
Compare
|
@alecsolder |
eed8815 to
39d6e8e
Compare
|
In your test script, I see you're using the streaming completions endpoint, which I don't think uses the harmony_utils method you modified? I just want to double check I'm reading it right |
|
Also, thanks for mentioning that huggingface tokenizer change, I hadn't seen it and my snapshot was out of date! |
|
Thank you for checking. I've double-checked the part you mentioned. |
|
When this PR will be merged guys? |
39d6e8e to
e8e7579
Compare
The changes include adding 'with_recipient' and the Assistant's 'analysis' content. Without adding this content, there was an issue where the gpt-oss model had a higher probability of outputting abnormal tokens when calling tools. Signed-off-by: kyt <eluban4532@gmail.com>
e8e7579 to
f13f45b
Compare
|
I was delayed today due to some issues while resolving git conflicts. The issue was caused by FlashInfer 0.4.0, which was added in recent commits, causing responses to repeat infinitely in the gpt-oss model. It took quite some time to find a solution. VLLM_USE_FLASHINFER_SAMPLER=0 The problem occurred in the sampling process, and it works normally when the FlashInfer sampler is disabled. |
will the fix need both the VLLM_USE_FLASHINFER_SAMPLER=0 and this pr? or just the flashinfer=0? |
I think this setting is for resolving an issue that occurs in a completely different place from the current PR. After testing multiple times, it seems like the same problem exists even without the current PR, so I think this setting can be used separately. |
|
Any progress here? |
|
Unfortunately, the current changes alone are insufficient to fully resolve the bug - the changes from the harmony repo are essential. Since OpenAI hasn't reviewed that PR yet, should I propose pip installing from my fork branch in the meantime?.. |
|
This PR fixes a bug that occurred when using tool calling with gpt-oss models. However, it requires additional changes from the openai/harmony repo, which unfortunately hasn't been reviewed for 1 months now. Would it be acceptable to temporarily modify the requirements to install from my forked harmony branch until the upstream PR is merged? |
@levunet |
|
@levunet, could you give an example where tool calling does not work in v0.11.1? I couldn't find any so I wonder if this fix and the harmony fix is actually needed. Basically #24768 makes tool calling work for me for chat completions in streaming mode. However, there are two remaining issues:
|
|
Ok, could reproduce the cases now with your script above and confirm that it still happens on v0.11.1 and is fixed with this PR and the harmony change. |
Purpose
This PR requires the following PRs to be merged first: #24768 and the harmony lib PR (openai/harmony#76).
The changes include adding 'with_recipient' and the Assistant's 'analysis' content.
Without adding this content, there was an issue where the gpt-oss model had a higher probability of outputting abnormal tokens when calling tools.
Test Plan
gpt-oss_test.py
messages.txt
Run python3 gpt-oss_test.py about 10 times.
Test Result
(before)

(after applying the harmony lib changes)
