- 
          
- 
                Notifications
    You must be signed in to change notification settings 
- Fork 10.9k
Open
Labels
bugSomething isn't workingSomething isn't workinggpt-ossRelated to GPT-OSS modelsRelated to GPT-OSS models
Description
Your current environment
The output of python collect_env.py
==============================
      Python Environment
==============================
Python version               : 3.12.11 (main, Jun  4 2025, 08:56:18) [GCC 11.4.0] (64-bit runtime)
Python platform              : Linux-3.10.0-1160.92.1.el7.x86_64-x86_64-with-glibc2.35
==============================
       CUDA / GPU Info
==============================
Is CUDA available            : True
CUDA runtime version         : 12.8.93
CUDA_MODULE_LOADING set to   : LAZY
GPU models and configuration :
GPU 0: NVIDIA L20
GPU 1: NVIDIA L20
GPU 2: NVIDIA L20
GPU 3: NVIDIA L20
GPU 4: NVIDIA L20
GPU 5: NVIDIA L20
GPU 6: NVIDIA L20
GPU 7: NVIDIA L20
Nvidia driver version        : 550.90.07
cuDNN version                : Could not collect
HIP runtime version          : N/A
MIOpen runtime version       : N/A
Is XNNPACK available         : True
==============================
         vLLM Info
==============================
ROCM Version                 : Could not collect
Neuron SDK Version           : N/A
vLLM Version                 : 0.10.1.1
vLLM Build Flags:
  CUDA Archs: Not Set; ROCm: Disabled; Neuron: Disabled
GPU Topology:
  	GPU0	GPU1	GPU2	GPU3	GPU4	GPU5	GPU6	GPU7	CPU Affinity	NUMA Affinity	GPU NUMA ID
GPU0	 X 	PIX	NODE	NODE	SYS	SYS	SYS	SYS	0-31,64-95	0		N/A
GPU1	PIX	 X 	NODE	NODE	SYS	SYS	SYS	SYS	0-31,64-95	0		N/A
GPU2	NODE	NODE	 X 	PIX	SYS	SYS	SYS	SYS	0-31,64-95	0		N/A
GPU3	NODE	NODE	PIX	 X 	SYS	SYS	SYS	SYS	0-31,64-95	0		N/A
GPU4	SYS	SYS	SYS	SYS	 X 	PIX	NODE	NODE	32-63,96-127	1		N/A
GPU5	SYS	SYS	SYS	SYS	PIX	 X 	NODE	NODE	32-63,96-127	1		N/A
GPU6	SYS	SYS	SYS	SYS	NODE	NODE	 X 	PIX	32-63,96-127	1		N/A
GPU7	SYS	SYS	SYS	SYS	NODE	NODE	PIX	 X 	32-63,96-127	1		N/A
🐛 Describe the bug
after serving gpt-oss-120b using vllm, I tried streaming function call examples according to openai cookbook streaming function call.
- if set stream=Truein CLIENT.responses.create(...), the output event contains reasoning_text, but function tool call is not included. like this,
ResponseCreatedEvent(response=Response(id='resp_f76035824b624b3da83ce3cb6eefdf8f', created_at=1756783814.0, error=None, incomplete_details=None, instructions=None, metadata=None, model='gpt-oss-120b', object='response', output=[], parallel_tool_calls=True, temperature=1.0, tool_choice='auto', tools=[FunctionTool(name='get_weather', parameters={'type': 'object', 'properties': {'location': {'type': 'string', 'description': 'City and country e.g. Bogotá, Colombia'}}, 'required': ['location'], 'additionalProperties': False}, strict=None, type='function', description='Get current temperature for a given location.')], top_p=1.0, background=False, max_output_tokens=130933, max_tool_calls=None, previous_response_id=None, prompt=None, prompt_cache_key=None, reasoning=None, safety_identifier=None, service_tier='auto', status='in_progress', text=None, top_logprobs=0, truncation='disabled', usage=None, user=None), sequence_number=0, type='response.created')
ResponseInProgressEvent(response=Response(id='resp_f76035824b624b3da83ce3cb6eefdf8f', created_at=1756783814.0, error=None, incomplete_details=None, instructions=None, metadata=None, model='gpt-oss-120b', object='response', output=[], parallel_tool_calls=True, temperature=1.0, tool_choice='auto', tools=[FunctionTool(name='get_weather', parameters={'type': 'object', 'properties': {'location': {'type': 'string', 'description': 'City and country e.g. Bogotá, Colombia'}}, 'required': ['location'], 'additionalProperties': False}, strict=None, type='function', description='Get current temperature for a given location.')], top_p=1.0, background=False, max_output_tokens=130933, max_tool_calls=None, previous_response_id=None, prompt=None, prompt_cache_key=None, reasoning=None, safety_identifier=None, service_tier='auto', status='in_progress', text=None, top_logprobs=0, truncation='disabled', usage=None, user=None), sequence_number=1, type='response.in_progress')
ResponseOutputItemDoneEvent(item=ResponseReasoningItem(id='', summary=[], type='reasoning', content=[Content(text='The user asks: "What\'s the weather like in Paris today?" Need to fetch weather via function get_weather with location "Paris, France". Use function.', type='reasoning_text')], encrypted_content=None, status='completed'), output_index=1, sequence_number=36, type='response.output_item.done')
- elif set stream=Falsein CLIENT.responses.create(...), the model can output both reasoning_text (ResponseReasoningItem) and function tool call (ResponseFunctionToolCall). like this,
Response(id='resp_d51ea05de80048f39ce97fa56f88d9c4', created_at=1756783924.0, error=None, incomplete_details=None, instructions=None, metadata=None, model='gpt-oss-120b', object='response', output=[ResponseReasoningItem(id='rs_88edd5e0ff8143068093e9eb2bd3fdf1', summary=[], type='reasoning', content=[Content(text='We need to get weather. Use function get_weather with location "Paris, France".', type='reasoning_text')], encrypted_content=None, status=None), ResponseFunctionToolCall(arguments='{\n  "location": "Paris, France"\n}', call_id='call_bc1a92fab0b44fcf8874ec261e5b06f2', name='get_weather', type='function_call', id='ft_bc1a92fab0b44fcf8874ec261e5b06f2', status=None)], parallel_tool_calls=True, temperature=1.0, tool_choice='auto', tools=[FunctionTool(name='get_weather', parameters={'type': 'object', 'properties': {'location': {'type': 'string', 'description': 'City and country e.g. Bogotá, Colombia'}}, 'required': ['location'], 'additionalProperties': False}, strict=None, type='function', description='Get current temperature for a given location.')], top_p=1.0, background=False, max_output_tokens=130933, max_tool_calls=None, previous_response_id=None, prompt=None, prompt_cache_key=None, reasoning=None, safety_identifier=None, service_tier='auto', status='completed', text=None, top_logprobs=0, truncation='disabled', usage=ResponseUsage(input_tokens=0, input_tokens_details=InputTokensDetails(cached_tokens=0), output_tokens=0, output_tokens_details=OutputTokensDetails(reasoning_tokens=0), total_tokens=0), user=None)
here is the python code
from openai import OpenAI
client = OpenAI(
      base_url='',
      api_key=''
)
tools = [{
    "type": "function",
    "name": "get_weather",
    "description": "Get current temperature for a given location.",
    "parameters": {
        "type": "object",
        "properties": {
            "location": {
                "type": "string",
                "description": "City and country e.g. Bogotá, Colombia"
            }
        },
        "required": [
            "location"
        ],
        "additionalProperties": False
    }
}]
# when stream=True
stream = client.responses.create(
    model="gpt-oss-120b",
    input=[{"role": "user", "content": "What's the weather like in Paris today?"}],
    tools=tools,
    stream=True
)
for event in stream:
    print(event)
# when stream=False
responses = CLIENT.responses.create(
    model=MODEL_NAME,
    input=[{"role": "user", "content": "What's the weather like in Paris today?"}],
    tools=tools,
)
print(responses)
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workinggpt-ossRelated to GPT-OSS modelsRelated to GPT-OSS models
Type
Projects
Status
In progress