Skip to content

Add pydantic-evals package #935

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 124 commits into from
Mar 28, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
124 commits
Select commit Hold shift + click to select a range
53368dd
Add initial work on evals
dmontagu Mar 2, 2025
3835268
Do some refactoring
dmontagu Mar 3, 2025
1e0e7f5
More cleanup refactors
dmontagu Mar 3, 2025
48589c5
Improve ability to send data to logfire
dmontagu Mar 3, 2025
e4f720a
Set the scope properly
dmontagu Mar 3, 2025
6129060
Add token usage metrics
dmontagu Mar 4, 2025
3578f9c
Add some work on assertions
dmontagu Mar 4, 2025
9920555
Add a score to the eval output
dmontagu Mar 4, 2025
bfdc762
Merge branch 'main' into dmontagu/evals
dmontagu Mar 4, 2025
1888395
instrument models
dmontagu Mar 5, 2025
a640fd3
Merge branch 'main' into dmontagu/evals
dmontagu Mar 6, 2025
141576c
Improve support for assertions
dmontagu Mar 6, 2025
ef31023
Merge branch 'main' into dmontagu/evals
dmontagu Mar 10, 2025
c6478d6
Upstream demo-related changes
dmontagu Mar 10, 2025
baba00e
Merge main
dmontagu Mar 13, 2025
c993cd3
A bit of cleanup
dmontagu Mar 13, 2025
37330df
Add SpanTree to scoring context
dmontagu Mar 13, 2025
0d42ab1
Merge branch 'main' into dmontagu/evals
dmontagu Mar 13, 2025
2265bf4
Merge main
dmontagu Mar 14, 2025
9b05975
Merge branch 'main' into dmontagu/evals
dmontagu Mar 15, 2025
ae97cf9
Clean up various handling of assessments
dmontagu Mar 15, 2025
22c1ca5
Reorganize code a bit
dmontagu Mar 16, 2025
3970a8c
WIP
dmontagu Mar 17, 2025
876a683
Various minor fixes
dmontagu Mar 18, 2025
9c82baf
Delete unnecessary scoring.py
dmontagu Mar 18, 2025
ca1c257
Update TODO comments
dmontagu Mar 18, 2025
b79b57e
Merge main
dmontagu Mar 18, 2025
979d4f0
Update pyprojects
dmontagu Mar 18, 2025
b3351f5
Merge main
dmontagu Mar 18, 2025
d4827db
Address some feedback
dmontagu Mar 18, 2025
f5ae40e
Merge main
dmontagu Mar 19, 2025
c6d4d71
Merge branch 'main' into dmontagu/evals
dmontagu Mar 19, 2025
198fa96
Address Alex's feedback
dmontagu Mar 19, 2025
2579b8d
Default to empty tuple instead of None
alexmojaki Mar 19, 2025
f2080c7
use get_unwrapped_function_name more
alexmojaki Mar 19, 2025
5934aeb
data -> cases
alexmojaki Mar 19, 2025
e19a476
Refactor span tree stuff a bit
dmontagu Mar 19, 2025
277bac2
include default_assessments in json schema
alexmojaki Mar 19, 2025
f77fb6a
Merge branch 'alex/evals-review2' into dmontagu/evals
dmontagu Mar 19, 2025
0690fe7
Merge branch 'alex/evals-review2' into dmontagu/evals
dmontagu Mar 19, 2025
7d2729d
Reorganize a bit
dmontagu Mar 20, 2025
50e49b1
Merge main
dmontagu Mar 20, 2025
47d14e4
Allow either Assessment or BoundAssessmentFunction in add_case and Da…
alexmojaki Mar 20, 2025
ed90711
fix generate_dataset_files
alexmojaki Mar 20, 2025
6551c19
name types
alexmojaki Mar 20, 2025
964268b
fix json schema name to match _DatasetModel
alexmojaki Mar 20, 2025
9ca6b78
fix json schema
alexmojaki Mar 20, 2025
6d71374
Merge branch 'main' into dmontagu/evals
dmontagu Mar 21, 2025
467d243
Get rid of Evaluation class
dmontagu Mar 21, 2025
067d0d8
Do various refactoring
dmontagu Mar 21, 2025
f24b4c5
Improve some APIs
dmontagu Mar 22, 2025
2767024
Clean up serialization
dmontagu Mar 22, 2025
2c7adfe
Add some initial work on docs and tests
dmontagu Mar 23, 2025
c252b5b
Add some more tests and functionality
dmontagu Mar 23, 2025
caf7acd
Add some more functionality
dmontagu Mar 23, 2025
9daabbe
Add some tests of span queries
dmontagu Mar 23, 2025
0fd2a5c
Add some more tests
dmontagu Mar 23, 2025
803ebf0
docs improvements
samuelcolvin Mar 23, 2025
8679d65
Merge branch 'main' into dmontagu/evals
samuelcolvin Mar 23, 2025
fafbbf3
Add a couple TODO comments
dmontagu Mar 23, 2025
2c24dfd
improving API docs
samuelcolvin Mar 23, 2025
340f631
working on report api docs
samuelcolvin Mar 23, 2025
f771958
fix pytest-examples
samuelcolvin Mar 23, 2025
2a5e51c
Tweak the data for the spans / evals panel
dmontagu Mar 24, 2025
b6fa750
Make it so logfire and otel are not required dependencies
dmontagu Mar 24, 2025
3cfb2de
Update a couple more comments etc.
dmontagu Mar 24, 2025
42dd521
Improve handling of Evaluators with invalid return types
dmontagu Mar 24, 2025
485716d
Some minor improvements
dmontagu Mar 24, 2025
8c556c3
Some renaming and refactoring
dmontagu Mar 24, 2025
827ab46
Change to using dataclass subclasses for custom evaluators
dmontagu Mar 25, 2025
4cdadc7
Add various docstrings
dmontagu Mar 25, 2025
5963117
WIP
dmontagu Mar 25, 2025
f626392
WIP
dmontagu Mar 25, 2025
773494c
Update docs
dmontagu Mar 25, 2025
2edd829
Fix various minor issues
dmontagu Mar 25, 2025
c98839b
Get the evals panel working
dmontagu Mar 25, 2025
fff21dd
Merge main
dmontagu Mar 25, 2025
4c6c595
Uncomment reports tests
dmontagu Mar 25, 2025
4fd9861
Handle NonRecording spans better
alexmojaki Mar 25, 2025
82effce
Move into try_import
alexmojaki Mar 25, 2025
3a7d199
Add note about undetected examples tests
dmontagu Mar 25, 2025
8096ba5
Move into try_import
alexmojaki Mar 25, 2025
f740db2
Merge branch 'dmontagu/evals' of github.com:pydantic/pydantic-ai into…
alexmojaki Mar 25, 2025
971ae71
Move into try_import
alexmojaki Mar 25, 2025
49150e5
test_live doesn't need evals
alexmojaki Mar 25, 2025
aa31788
fix typing imports
alexmojaki Mar 25, 2025
4688ce6
fix types for 3.9
alexmojaki Mar 25, 2025
bdc97fc
use anyio
alexmojaki Mar 25, 2025
1e8a873
use anyio
alexmojaki Mar 25, 2025
e35ad41
docs improvements
samuelcolvin Mar 25, 2025
d0cdc5e
Fix order of dataset stuff using new task_group_gather
alexmojaki Mar 25, 2025
a59013b
Merge branch 'dmontagu/evals' of github.com:pydantic/pydantic-ai into…
alexmojaki Mar 25, 2025
13c265c
Fix some issue
dmontagu Mar 25, 2025
a5d5983
Rework evaluators to just define evaluate
dmontagu Mar 25, 2025
3bd9498
3.9
alexmojaki Mar 25, 2025
0b0c3bd
Force specific width for wide printed tables in examples
dmontagu Mar 25, 2025
9c1a648
Merge branch 'dmontagu/evals' of github.com:pydantic/pydantic-ai into…
alexmojaki Mar 25, 2025
312ed5e
Fix console width stuff
dmontagu Mar 25, 2025
103b7ad
Add pydantic_evals.examples to docs
alexmojaki Mar 25, 2025
5ca7f03
Merge branch 'dmontagu/evals' of github.com:pydantic/pydantic-ai into…
alexmojaki Mar 25, 2025
88b54bd
Add lots of tests
dmontagu Mar 25, 2025
3f13a80
Try again
dmontagu Mar 25, 2025
3537d4e
Try fixing imports
dmontagu Mar 25, 2025
0764e9c
Try fixing imports
dmontagu Mar 25, 2025
83b255b
JSON JSON schema and docs work
samuelcolvin Mar 25, 2025
fb4b44a
Add some coverage
dmontagu Mar 25, 2025
8fe29a9
Merge branch 'main' into dmontagu/evals
dmontagu Mar 25, 2025
3d20220
Fix 3.9 tests
dmontagu Mar 25, 2025
1f571f1
Add lots of test coverage
dmontagu Mar 26, 2025
9ea6b71
Merge main
dmontagu Mar 26, 2025
aa139b9
Try fixing CI
dmontagu Mar 26, 2025
9d7d92f
Try fixing CI
dmontagu Mar 26, 2025
01f4eef
Fix CI
dmontagu Mar 26, 2025
06e4abc
Some clean-up
dmontagu Mar 27, 2025
f170e34
Fix docs example
dmontagu Mar 27, 2025
5ebcfb1
Rename the examples module to generation
dmontagu Mar 27, 2025
71f4731
more docs
samuelcolvin Mar 27, 2025
47881c2
tweak examples
samuelcolvin Mar 27, 2025
f30b014
Upgrade to pydantic 2.11
dmontagu Mar 27, 2025
909dbe2
Merge branch 'main' into dmontagu/evals
dmontagu Mar 27, 2025
23eef2f
Revert upgrade to pydantic 2.11
dmontagu Mar 28, 2025
68b6536
Merge main
dmontagu Mar 28, 2025
e603ee6
Fix CI
dmontagu Mar 28, 2025
0782afc
Merge branch 'main' into dmontagu/evals
samuelcolvin Mar 28, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,14 @@ install: .uv .pre-commit ## Install the package, dependencies, and pre-commit fo
uv sync --frozen --all-extras --all-packages --group lint --group docs
pre-commit install --install-hooks

.PHONY: install-all-python
install-all-python: ## Install and synchronize an interpreter for every python version
UV_PROJECT_ENVIRONMENT=.venv39 uv sync --python 3.9 --frozen --all-extras --all-packages --group lint --group docs
UV_PROJECT_ENVIRONMENT=.venv310 uv sync --python 3.10 --frozen --all-extras --all-packages --group lint --group docs
UV_PROJECT_ENVIRONMENT=.venv311 uv sync --python 3.11 --frozen --all-extras --all-packages --group lint --group docs
UV_PROJECT_ENVIRONMENT=.venv312 uv sync --python 3.12 --frozen --all-extras --all-packages --group lint --group docs
UV_PROJECT_ENVIRONMENT=.venv313 uv sync --python 3.13 --frozen --all-extras --all-packages --group lint --group docs

.PHONY: sync
sync: .uv ## Update local packages and uv.lock
uv sync --all-extras --all-packages --group lint --group docs
Expand Down
3 changes: 3 additions & 0 deletions docs/api/pydantic_evals/dataset.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# `pydantic_evals.dataset`

::: pydantic_evals.dataset
3 changes: 3 additions & 0 deletions docs/api/pydantic_evals/evaluators.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# `pydantic_evals.evaluators`

::: pydantic_evals.evaluators
3 changes: 3 additions & 0 deletions docs/api/pydantic_evals/generation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# `pydantic_evals.generation`

::: pydantic_evals.generation
3 changes: 3 additions & 0 deletions docs/api/pydantic_evals/otel.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# `pydantic_evals.otel`

::: pydantic_evals.otel
3 changes: 3 additions & 0 deletions docs/api/pydantic_evals/reporting.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# `pydantic_evals.reporting`

::: pydantic_evals.reporting
Loading