genai: fix pydantic structured_output with array #469

nobu007 · 2024-08-28T18:50:31Z

PR Description

Fix pydantic structured_output with array

Relevant issues

#24225
langchain-ai/langchain#24225

Type

🐛 Bug Fix

Testing(optional)

This PR can pass like here.

class TestClass(BaseModel):
    """_summary_

    Args:
        BaseModel (_type_): _description_
    """
    test_val: str = Field(default="", description="value for test(works).")
    test_val_list: List[str] = Field(
        default_factory=list,
        description="values for test(not works).",
    )

Test result

jinno@jinno-desktop:~/git/drill/gamebook/codeinterpreter_api_agent/langchain-google/libs/genai$ poetry run pytest --extended tests/integration_tests/test_chat_models.py::test_chat_vertexai_gemini_function_calling
============================================================== test session starts ===============================================================
platform linux -- Python 3.10.10, pytest-7.4.4, pluggy-1.5.0
rootdir: /home/jinno/git/drill/gamebook/codeinterpreter_api_agent/langchain-google/libs/genai
configfile: pyproject.toml
plugins: anyio-4.4.0
collected 1 item                                                                                                                                 

tests/integration_tests/test_chat_models.py .                                                                                              [100%]

================================================================ warnings summary ================================================================
../../../../../../../.cache/pypoetry/virtualenvs/langchain-google-genai-ov3b6nnP-py3.10/lib/python3.10/site-packages/_pytest/config/__init__.py:1373
  /home/jinno/.cache/pypoetry/virtualenvs/langchain-google-genai-ov3b6nnP-py3.10/lib/python3.10/site-packages/_pytest/config/__init__.py:1373: PytestConfigWarning: Unknown config option: asyncio_mode
  
    self._warn_or_fail_if_strict(f"Unknown config option: {key}\n")

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html

Note1

This code shoud be updated later.
_set_schema_items() can't create correct function-calling scema.
https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/function-calling?hl#schema

Note2

This PR may be fix #492.
integration_test filed by this issue.

lkuligin · 2024-08-29T07:31:54Z

could you add a unit test too, please?

nobu007 · 2024-08-30T14:28:44Z

@lkuligin
I added condition for unittest and integtest.
Please confirm it.

poetry run pytest --extended tests/integration_tests/test_chat_models.py::test_chat_vertexai_gemini_function_calling

lkuligin · 2024-09-05T06:12:42Z

libs/genai/langchain_google_genai/_function_utils.py

-            return TYPE_ENUM[stype]
-        else:
-            pass
+        return _get_type_from_str(stype)


can we maybe just have a one-liner here instead creating a separate function?
TYPE_ENUM.get(stype, "str")

@lkuligin
I fixed as one-liner.

TannerW · 2024-09-16T19:29:17Z

Any progress on this?

nobu007 · 2024-09-20T16:43:15Z

@lkuligin
#506 will fix integration_test.
So I think #469 can be merged.
It looks randomly failed. I retried but failed again.
If passing integration_test is needed, please merge #506 first.

nobu007 · 2024-09-22T17:10:20Z

@lkuligin
integration_test was caused by #469, but it is fixed now.

tudoanh · 2024-09-30T02:30:46Z

Encountered this issue too

lkuligin · 2024-10-03T18:08:37Z

libs/genai/tests/integration_tests/test_chat_models.py

    model = ChatGoogleGenerativeAI(model=_MODEL, safety_settings=safety).bind_tools(
        [MyModel]
    )
    response = model.invoke([message])
+    print("response=", response)


nits: we probably don't need it anymore, do we?

RafaelMCarvalho · 2024-10-16T19:57:39Z

Thanks for the contribution.

I've noticed we still have issues when there's an array like this:

class Person(BaseModel):
    name: str = Field(..., description="The name of the person")
    height_in_meters: float = Field(
        ..., description="The height of the person expressed in meters."
    )

class People(BaseModel):
    people: List[Person]

So the following code raised some spec ignoring warning and the result leads to invalid Pydantic input:

llm = ChatGoogleGenerativeAI(model='gemini-1.5-pro', api_key=settings.google_api_key, temperature=0.2)
query = "Anna is 23 years old and she is 6 feet tall; John is 34 and about 171cm"
prompt = ChatPromptTemplate.from_messages([("system", "Answer the user query."), ("human", "{query}") ])
chain = prompt | llm.with_structured_output(People, include_raw=True)
# WARNING OUTPUT:
# Value 'Information about a person.' is not supported in schema, ignoring v=Information about a person.
# Value '['name', 'height_in_meters']' is not supported in schema, ignoring v=['name', 'height_in_meters']
# Value 'Person' is not supported in schema, ignoring v=Person
# Value 'object' is not supported in schema, ignoring v=object
# Value 'Information about a person.' is not supported in schema, ignoring v=Information about a person.
# Value '['name', 'height_in_meters']' is not supported in schema, ignoring v=['name', 'height_in_meters']
# Value 'Person' is not supported in schema, ignoring v=Person
# Value 'object' is not supported in schema, ignoring v=object

chain.invoke({"query": query})
# OUTPUT:
# {'name': 'People', 'parameters': {'type_': 6, 'properties': {'people': {'type_': 5, 'items': {'type_': 1, 'format_': '', 'description': '', 'nullable': False, 'enum': [], 'max_items': '0', 'min_items': '0', 'properties': {}, 'required': []}, 'format_': '', 'description': '', 'nullable': False, 'enum': [], 'max_items': '0', 'min_items': '0', 'properties': {}, 'required': []}}, 'required': ['people'], 'format_': '', 'description': '', 'nullable': False, 'enum': [], 'max_items': '0', 'min_items': '0'}, 'description': ''}
# {'raw': AIMessage(content='', additional_kwargs={'function_call': {'name': 'People', 'arguments': '{"people": ["Anna", "John"]}'}}, response_metadata={'prompt_feedback': {'block_reason': 0, 'safety_ratings': []}, 'finish_reason': 'STOP', 'safety_ratings': [{'category': 'HARM_CATEGORY_DANGEROUS_CONTENT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_HATE_SPEECH', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_HARASSMENT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_SEXUALLY_EXPLICIT', 'probability': 'NEGLIGIBLE', 'blocked': False}]}, id='run-d4f260cb-1ec2-49a9-befc-e99b8ba235de-0', tool_calls=[{'name': 'People', 'args': {'people': ['Anna', 'John']}, 'id': 'df592d4a-a4a7-48b3-a212-983a97e100d1', 'type': 'tool_call'}], usage_metadata={'input_tokens': 65, 'output_tokens': 16, 'total_tokens': 81}), 'parsing_error': 2 validation errors for People
# people.0
#   Input should be a valid dictionary or instance of Person [type=model_type, input_value='Anna', input_type=str]
#     For further information visit https://errors.pydantic.dev/2.8/v/model_type
# people.1
#   Input should be a valid dictionary or instance of Person [type=model_type, input_value='John', input_type=str]
#     For further information visit https://errors.pydantic.dev/2.8/v/model_type, 'parsed': None}

Is it a Gemini limitation that I'm missing?

nobu007 force-pushed the fix_with_structured_output branch 4 times, most recently from 8408eb7 to e0d26db Compare August 28, 2024 19:25

nobu007 force-pushed the fix_with_structured_output branch from e0d26db to 41c1a2e Compare August 29, 2024 15:24

nobu007 mentioned this pull request Aug 29, 2024

genai: fix make integration test errors #407

Closed

nobu007 force-pushed the fix_with_structured_output branch 2 times, most recently from 2f7e53d to 2ecdfa3 Compare August 30, 2024 14:25

nobu007 force-pushed the fix_with_structured_output branch 2 times, most recently from 259c9d3 to d25395f Compare September 1, 2024 05:23

lkuligin reviewed Sep 5, 2024

View reviewed changes

nobu007 force-pushed the fix_with_structured_output branch 2 times, most recently from 8ec641c to 056cfdf Compare September 20, 2024 16:09

nobu007 force-pushed the fix_with_structured_output branch 3 times, most recently from 8d08df9 to 9212f29 Compare September 22, 2024 12:21

genai: fix pydantic structured_output with array

9451da9

nobu007 force-pushed the fix_with_structured_output branch from 030ee52 to 9451da9 Compare September 22, 2024 16:59

nobu007 requested a review from lkuligin September 22, 2024 17:10

Merge branch 'main' into fix_with_structured_output

a923414

lkuligin reviewed Oct 3, 2024

View reviewed changes

remove print

ec2f451

lkuligin approved these changes Oct 3, 2024

View reviewed changes

jzaldi mentioned this pull request Oct 4, 2024

vertexai: Allow json_mode in with_structured_output #533

Merged

lkuligin merged commit 5594bc1 into langchain-ai:main Oct 7, 2024
15 checks passed

alexminza mentioned this pull request Oct 28, 2024

[Google Generative AI] Structured Output doesn't work with advanced schema langchain-ai/langchain#24225

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

genai: fix pydantic structured_output with array #469

genai: fix pydantic structured_output with array #469

nobu007 commented Aug 28, 2024 •

edited

Loading

lkuligin commented Aug 29, 2024

nobu007 commented Aug 30, 2024

lkuligin Sep 5, 2024

nobu007 Sep 20, 2024

TannerW commented Sep 16, 2024

nobu007 commented Sep 20, 2024 •

edited

Loading

nobu007 commented Sep 22, 2024

tudoanh commented Sep 30, 2024

lkuligin Oct 3, 2024

RafaelMCarvalho commented Oct 16, 2024

genai: fix pydantic structured_output with array #469

genai: fix pydantic structured_output with array #469

Conversation

nobu007 commented Aug 28, 2024 • edited Loading

PR Description

Relevant issues

Type

Testing(optional)

Note1

Note2

lkuligin commented Aug 29, 2024

nobu007 commented Aug 30, 2024

lkuligin Sep 5, 2024

Choose a reason for hiding this comment

nobu007 Sep 20, 2024

Choose a reason for hiding this comment

TannerW commented Sep 16, 2024

nobu007 commented Sep 20, 2024 • edited Loading

nobu007 commented Sep 22, 2024

tudoanh commented Sep 30, 2024

lkuligin Oct 3, 2024

Choose a reason for hiding this comment

RafaelMCarvalho commented Oct 16, 2024

nobu007 commented Aug 28, 2024 •

edited

Loading

nobu007 commented Sep 20, 2024 •

edited

Loading