Skip to content

[Bug]: While serving GPT-OSS, Streaming function calls output only reasoning_text, without function tool call #24076

@gsu2017

Description

@gsu2017

Your current environment

The output of python collect_env.py
==============================
      Python Environment
==============================
Python version               : 3.12.11 (main, Jun  4 2025, 08:56:18) [GCC 11.4.0] (64-bit runtime)
Python platform              : Linux-3.10.0-1160.92.1.el7.x86_64-x86_64-with-glibc2.35

==============================
       CUDA / GPU Info
==============================
Is CUDA available            : True
CUDA runtime version         : 12.8.93
CUDA_MODULE_LOADING set to   : LAZY
GPU models and configuration :
GPU 0: NVIDIA L20
GPU 1: NVIDIA L20
GPU 2: NVIDIA L20
GPU 3: NVIDIA L20
GPU 4: NVIDIA L20
GPU 5: NVIDIA L20
GPU 6: NVIDIA L20
GPU 7: NVIDIA L20

Nvidia driver version        : 550.90.07
cuDNN version                : Could not collect
HIP runtime version          : N/A
MIOpen runtime version       : N/A
Is XNNPACK available         : True

==============================
         vLLM Info
==============================
ROCM Version                 : Could not collect
Neuron SDK Version           : N/A
vLLM Version                 : 0.10.1.1
vLLM Build Flags:
  CUDA Archs: Not Set; ROCm: Disabled; Neuron: Disabled
GPU Topology:
  	GPU0	GPU1	GPU2	GPU3	GPU4	GPU5	GPU6	GPU7	CPU Affinity	NUMA Affinity	GPU NUMA ID
GPU0	 X 	PIX	NODE	NODE	SYS	SYS	SYS	SYS	0-31,64-95	0		N/A
GPU1	PIX	 X 	NODE	NODE	SYS	SYS	SYS	SYS	0-31,64-95	0		N/A
GPU2	NODE	NODE	 X 	PIX	SYS	SYS	SYS	SYS	0-31,64-95	0		N/A
GPU3	NODE	NODE	PIX	 X 	SYS	SYS	SYS	SYS	0-31,64-95	0		N/A
GPU4	SYS	SYS	SYS	SYS	 X 	PIX	NODE	NODE	32-63,96-127	1		N/A
GPU5	SYS	SYS	SYS	SYS	PIX	 X 	NODE	NODE	32-63,96-127	1		N/A
GPU6	SYS	SYS	SYS	SYS	NODE	NODE	 X 	PIX	32-63,96-127	1		N/A
GPU7	SYS	SYS	SYS	SYS	NODE	NODE	PIX	 X 	32-63,96-127	1		N/A

🐛 Describe the bug

after serving gpt-oss-120b using vllm, I tried streaming function call examples according to openai cookbook streaming function call.

  • if set stream=True in CLIENT.responses.create(...), the output event contains reasoning_text, but function tool call is not included. like this,
ResponseCreatedEvent(response=Response(id='resp_f76035824b624b3da83ce3cb6eefdf8f', created_at=1756783814.0, error=None, incomplete_details=None, instructions=None, metadata=None, model='gpt-oss-120b', object='response', output=[], parallel_tool_calls=True, temperature=1.0, tool_choice='auto', tools=[FunctionTool(name='get_weather', parameters={'type': 'object', 'properties': {'location': {'type': 'string', 'description': 'City and country e.g. Bogotá, Colombia'}}, 'required': ['location'], 'additionalProperties': False}, strict=None, type='function', description='Get current temperature for a given location.')], top_p=1.0, background=False, max_output_tokens=130933, max_tool_calls=None, previous_response_id=None, prompt=None, prompt_cache_key=None, reasoning=None, safety_identifier=None, service_tier='auto', status='in_progress', text=None, top_logprobs=0, truncation='disabled', usage=None, user=None), sequence_number=0, type='response.created')
ResponseInProgressEvent(response=Response(id='resp_f76035824b624b3da83ce3cb6eefdf8f', created_at=1756783814.0, error=None, incomplete_details=None, instructions=None, metadata=None, model='gpt-oss-120b', object='response', output=[], parallel_tool_calls=True, temperature=1.0, tool_choice='auto', tools=[FunctionTool(name='get_weather', parameters={'type': 'object', 'properties': {'location': {'type': 'string', 'description': 'City and country e.g. Bogotá, Colombia'}}, 'required': ['location'], 'additionalProperties': False}, strict=None, type='function', description='Get current temperature for a given location.')], top_p=1.0, background=False, max_output_tokens=130933, max_tool_calls=None, previous_response_id=None, prompt=None, prompt_cache_key=None, reasoning=None, safety_identifier=None, service_tier='auto', status='in_progress', text=None, top_logprobs=0, truncation='disabled', usage=None, user=None), sequence_number=1, type='response.in_progress')
ResponseOutputItemDoneEvent(item=ResponseReasoningItem(id='', summary=[], type='reasoning', content=[Content(text='The user asks: "What\'s the weather like in Paris today?" Need to fetch weather via function get_weather with location "Paris, France". Use function.', type='reasoning_text')], encrypted_content=None, status='completed'), output_index=1, sequence_number=36, type='response.output_item.done')
  • elif set stream=False in CLIENT.responses.create(...), the model can output both reasoning_text (ResponseReasoningItem) and function tool call (ResponseFunctionToolCall). like this,
Response(id='resp_d51ea05de80048f39ce97fa56f88d9c4', created_at=1756783924.0, error=None, incomplete_details=None, instructions=None, metadata=None, model='gpt-oss-120b', object='response', output=[ResponseReasoningItem(id='rs_88edd5e0ff8143068093e9eb2bd3fdf1', summary=[], type='reasoning', content=[Content(text='We need to get weather. Use function get_weather with location "Paris, France".', type='reasoning_text')], encrypted_content=None, status=None), ResponseFunctionToolCall(arguments='{\n  "location": "Paris, France"\n}', call_id='call_bc1a92fab0b44fcf8874ec261e5b06f2', name='get_weather', type='function_call', id='ft_bc1a92fab0b44fcf8874ec261e5b06f2', status=None)], parallel_tool_calls=True, temperature=1.0, tool_choice='auto', tools=[FunctionTool(name='get_weather', parameters={'type': 'object', 'properties': {'location': {'type': 'string', 'description': 'City and country e.g. Bogotá, Colombia'}}, 'required': ['location'], 'additionalProperties': False}, strict=None, type='function', description='Get current temperature for a given location.')], top_p=1.0, background=False, max_output_tokens=130933, max_tool_calls=None, previous_response_id=None, prompt=None, prompt_cache_key=None, reasoning=None, safety_identifier=None, service_tier='auto', status='completed', text=None, top_logprobs=0, truncation='disabled', usage=ResponseUsage(input_tokens=0, input_tokens_details=InputTokensDetails(cached_tokens=0), output_tokens=0, output_tokens_details=OutputTokensDetails(reasoning_tokens=0), total_tokens=0), user=None)

here is the python code

from openai import OpenAI

client = OpenAI(
      base_url='',
      api_key=''
)

tools = [{
    "type": "function",
    "name": "get_weather",
    "description": "Get current temperature for a given location.",
    "parameters": {
        "type": "object",
        "properties": {
            "location": {
                "type": "string",
                "description": "City and country e.g. Bogotá, Colombia"
            }
        },
        "required": [
            "location"
        ],
        "additionalProperties": False
    }
}]

# when stream=True
stream = client.responses.create(
    model="gpt-oss-120b",
    input=[{"role": "user", "content": "What's the weather like in Paris today?"}],
    tools=tools,
    stream=True
)

for event in stream:
    print(event)

# when stream=False
responses = CLIENT.responses.create(
    model=MODEL_NAME,
    input=[{"role": "user", "content": "What's the weather like in Paris today?"}],
    tools=tools,
)
print(responses)

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workinggpt-ossRelated to GPT-OSS models

    Type

    No type

    Projects

    Status

    In progress

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions