Skip to content

Evals does not handle BinaryContent. type. to_jsonable_python(case.inputs) | UnicodeDecodeError #1357

Closed
@aidiss

Description

@aidiss

Initial Checks

Description

Gemini models support BinaryContent.
However, when I add BinaryContent into a Case I get an error.

Here is a full code to reproduce the issue. Just need a sample image.

Error

  report = dataset.evaluate_sync(guess)
  + Exception Group Traceback (most recent call last):
  |   File "/home/ads/sndbx/pydantic-evals-sndb/failure.py", line 23, in <module>
  |     report = dataset.evaluate_sync(guess)
  |   File "/home/ads/sndbx/pydantic-evals-sndb/.venv/lib/python3.13/site-packages/pydantic_evals/dataset.py", line 315, in evaluate_sync
  |     return get_event_loop().run_until_complete(self.evaluate(task, name=name, max_concurrency=max_concurrency))
  |            ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  |   File "/home/ads/.local/share/uv/python/cpython-3.13.0-linux-x86_64-gnu/lib/python3.13/asyncio/base_events.py", line 721, in run_until_complete
  |     return future.result()
  |            ~~~~~~~~~~~~~^^
  |   File "/home/ads/sndbx/pydantic-evals-sndb/.venv/lib/python3.13/site-packages/pydantic_evals/dataset.py", line 283, in evaluate
  |     cases=await task_group_gather(
  |           ^^^^^^^^^^^^^^^^^^^^^^^^
  |     ...<4 lines>...
  |     ),
  |     ^
  |   File "/home/ads/sndbx/pydantic-evals-sndb/.venv/lib/python3.13/site-packages/pydantic_evals/_utils.py", line 99, in task_group_gather
  |     async with anyio.create_task_group() as tg:
  |                ~~~~~~~~~~~~~~~~~~~~~~~^^
  |   File "/home/ads/sndbx/pydantic-evals-sndb/.venv/lib/python3.13/site-packages/anyio/_backends/_asyncio.py", line 772, in __aexit__
  |     raise BaseExceptionGroup(
  |         "unhandled errors in a TaskGroup", self._exceptions
  |     ) from None
  | ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception)
  +-+---------------- 1 ----------------
    | Traceback (most recent call last):
    |   File "/home/ads/sndbx/pydantic-evals-sndb/.venv/lib/python3.13/site-packages/pydantic_evals/_utils.py", line 97, in _run_task
    |     results[index] = await tsk()
    |                      ^^^^^^^^^^^
    |   File "/home/ads/sndbx/pydantic-evals-sndb/.venv/lib/python3.13/site-packages/pydantic_evals/dataset.py", line 279, in _handle_case
    |     return await _run_task_and_evaluators(task, case, report_case_name, self.evaluators)
    |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    |   File "/home/ads/sndbx/pydantic-evals-sndb/.venv/lib/python3.13/site-packages/pydantic_evals/dataset.py", line 910, in _run_task_and_evaluators
    |     report_inputs = to_jsonable_python(case.inputs)
    | UnicodeDecodeError: 'utf-8' codec can't decode byte 0x89 in position 0: invalid utf-8

Example Code

from pydantic import BaseModel
from pydantic_ai import BinaryContent
from pydantic_evals import Case, Dataset


class Input(BaseModel):
    data: BinaryContent


with open("image.png", "rb") as f:
    image_data1 = f.read()


data = BinaryContent(data=image_data1, media_type="image/png")
dataset = Dataset(cases=[Case(inputs=Input(data=data))])


async def guess(question: Input) -> str:
    # Use Gemini model that accepts BinaryInput
    return ""


report = dataset.evaluate_sync(guess)

Python, Pydantic AI & LLM client version

requires-python = ">=3.13"
dependencies = [
    "pydantic-ai-slim[openai]>=0.0.49",
    "pydantic-evals[logfire]>=0.0.49",
    "pyyaml>=6.0.2",
]

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions