Skip to content

Apply more fixes for Pydantic schema incompatibilities with OpenAI structured outputs #1659

Open
@mcantrell

Description

@mcantrell

Confirm this is a feature request for the Python library and not the underlying OpenAI API.

  • This is a feature request for the Python library

Describe the feature or improvement you're requesting

I noticed that you guys are doing some manipulation of Pydantic's generated schema to ensure compatibility with the API's schema validation. I found a few more instances that can be addressed:

Issues:

  • optional fields with pydantic defaults generate an unsupported 'default' field in the schema
  • date fields generate a format='date-time' field in the schema which is not supported

The test cases below builds on your to_strict_json_schema function and removes addresses these problematic fields with the remove_property_from_schema function:

class Publisher(BaseModel):
    name: str = Field(description="The name publisher")
    url: Optional[str] = Field(None, description="The URL of the publisher's website")
    class Config:
        json_schema_extra = {
            "additionalProperties": False
        }

class Article(BaseModel):
    title: str = Field(description="The title of the news article")
    published: Optional[datetime] = Field(None, description="The date the article was published. Use ISO 8601 to format this value.")
    publisher: Optional[Publisher] = Field(None, description="The publisher of the article")
    class Config:
        json_schema_extra = {
            "additionalProperties": False
        }
        
class NewsArticles(BaseModel):
    query: str = Field(description="The query used to search for news articles")
    articles: List[Article] = Field(description="The list of news articles returned by the query")
    class Config:
        json_schema_extra = {
            "additionalProperties": False
        }
    

def test_schema_compatible():
    client = OpenAI()
    
    # build on the internals that the openai client uses to clean up the pydantic schema for the openai API
    schema = to_strict_json_schema(NewsArticles)
    
    # optional fields with pydantic defaults generate an unsupported 'default' field in the schema
    remove_property_from_schema(schema, "default")
    # date fields generate a format='date-time' field in the schema which is not supported
    remove_property_from_schema(schema, "format")
        
    logger.info("Generated Schema: %s", json.dumps(schema, indent=2))
    completion = client.beta.chat.completions.parse(
        model="gpt-4o-2024-08-06",
        temperature=0,
        messages=[
            {
                "role": "user",
                "content":  "What where the top headlines in the US for January 6th, 2021?",
            }
        ],
        response_format={
            "type": "json_schema",
            "json_schema": {
                "schema": schema,
                "name": "NewsArticles",
                "strict": True,
            }
        }
    )
    result = NewsArticles.model_validate_json(completion.choices[0].message.content)
    assert result is not None



def remove_property_from_schema(schema: dict, property_name: str):
    if 'properties' in schema:
        for field_name, field in schema['properties'].items():
            if 'properties' in field:
                remove_property_from_schema(field, property_name)
            if 'anyOf' in field: 
                for any_of in field['anyOf']:
                    any_of.pop(property_name, None)
            field.pop(property_name, None)
    if '$defs' in schema:                    
        for definition_name, definition in schema['$defs'].items():
            remove_property_from_schema(definition, property_name)

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions