-
Notifications
You must be signed in to change notification settings - Fork 4.6k
Description
Confirm this is an issue with the Python library and not an underlying OpenAI API
- This is an issue with the Python library
Describe the bug
When calling client.beta.chat.completions.parse() for structured output on an Azure-hosted GPT-5 deployment, the API returns a 400 BadRequestError:
openai.BadRequestError: Error code: 400 - {
'error': {
'message': 'Could not finish the message because max_tokens or model output limit was reached. Please try again with higher max_tokens.',
'type': 'invalid_request_error',
'param': None,
'code': None
}
}
This occurs even though neither max_tokens nor max_completion_tokens is set anywhere in the request.
This is related to #2046, which was closed with the suggestion to use max_completion_tokens instead of max_tokens. That resolution does not apply here, this issue occurs when no token limit parameter is passed at all. The error message is therefore misleading: it implies the caller set a limit that was too low, when in fact no limit was set.
Expected Behavior
No token limit is enforced when neither max_tokens nor max_completion_tokens is provided, consistent with how chat.completions.create() behaves.
Actual Behavior
A 400 BadRequestError is raised claiming the model output limit was reached, despite no limit being set by the caller.
To Reproduce
- Create an
AsyncAzureOpenAIclient pointing to a GPT-5 Azure deployment - Call
client.beta.chat.completions.parse()with a Pydantic model asresponse_format - Pass
reasoning_effortbut do not passmax_tokensormax_completion_tokens - Use a moderately complex Pydantic schema (e.g. nested models)
Code snippets
from openai import AsyncAzureOpenAI
client = AsyncAzureOpenAI(
azure_endpoint="<AZURE_GPT5_ENDPOINT>",
azure_deployment="<AZURE_GPT5_DEPLOYMENT>",
api_version="<API_VERSION>",
api_key="<API_KEY>",
)
completion = await client.beta.chat.completions.parse(
model="<model_name>",
messages=[
{"role": "developer", "content": system_prompt},
{"role": "user", "content": user_prompt},
],
response_format=MyPydanticOutputClass, # Pydantic model for structured output
reasoning_effort="minimal",
# max_tokens and max_completion_tokens are intentionally NOT set
)
result = completion.choices[0].message.parsedOS
Ubuntu 24.04.2 LTS
Python version
3.11.13
Library version
openai 1.75.0