Skip to content

Generalize the JSON schema transformations #1481

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Apr 15, 2025

Conversation

dmontagu
Copy link
Contributor

@dmontagu dmontagu commented Apr 15, 2025

This moves the logic for modifying JSON schema into a shared WalkJsonSchema class to make it easier to reuse across different models, and to make it easier to implement JSON schema modifications in different vendor-specific models.

I suspect we may want to make it easier for users to customize and/or override this; in particular, we probably want a way for a user to hard-code the specific schema they want to use for a given tool (possibly even on a per-model basis, in the event of fallback/manually switching providers?). E.g., if they want to control how much duplication there is in the generated JSON schema if reusing a given type in a few places. But I think let's not worry about that for now.

This PR already makes a number of fixes to handling of JSON schemas with OpenAI and Gemini. I used the following very useful script, tweaked to test different schemas and models. (Note — turning verbose console logging has the extremely nice benefit of it printing out the tool schema to the console, so you can see the result of the transformation.)

from typing import Literal, Annotated

import logfire
from pydantic import BaseModel, Discriminator

from pydantic_ai import Agent

logfire.instrument_pydantic_ai()
logfire.configure(send_to_logfire=False, console=logfire.ConsoleOptions(verbose=True))


class MyA(BaseModel):
    kind: Literal['a']


class MyB(BaseModel):
    kind: Literal['b']


class Output(BaseModel):
    x: Annotated[MyA | MyB, Discriminator('kind')]


agent = Agent(
    'google-gla:gemini-2.5-pro-exp-03-25',
    output_type=Output,
    system_prompt='You are a helpful assistant.',
)

result = agent.run_sync(
    user_prompt='Use the final_result tool to generate a response. I just want you to generate some example data compatible with the schema.'
)
print(result.output)

@Kludex it might be nice if we had some system for testing that the transformations we're using are required, I guess we could use VCR or something but I more want to test that I get errors for certain schemas if I don't make transformations to the schema, which is kind of annoying. I guess we could monkeypatch the transformation function to be a no-op ... 😩. I think for now we can skip it but I wouldn't mind if you did want to add it 😄.

Copy link

github-actions bot commented Apr 15, 2025

Docs Preview

commit: fc581c6
Preview URL: https://15449ba3-pydantic-ai-previews.pydantic.workers.dev

@Kludex
Copy link
Member

Kludex commented Apr 15, 2025

Yeah, I agree we need the RecordModel.

min_length = schema.pop('minLength', None)
max_length = schema.pop('minLength', None)
if description is not None:
notes = list[str]()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you do this in 3.9?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❯ UV_PROJECT_ENVIRONMENT=.venv39 uv run --python 3.9 --all-extras --all-packages python    
warning: `VIRTUAL_ENV=.venv` does not match the project environment path `.venv39` and will be ignored; use `--active` to target the active environment instead
Python 3.9.20 (main, Sep  6 2024, 19:03:56) 
[Clang 15.0.0 (clang-1500.3.9.4)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> list[str]()
[]

Yes, but I guess I should probably use a type hint for better perf. Not that it will matter in this code path.

@dmontagu dmontagu merged commit cc18937 into main Apr 15, 2025
17 checks passed
@dmontagu dmontagu deleted the dmontagu/generalize-json-schema-transformations branch April 15, 2025 19:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants