-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Closed
Labels
Feature requestNew feature requestNew feature request
Description
(Or Evils as I'm coming to think of them)
We want to build an open-source, sane way to score the performance of LLM calls that is:
- local first - so you don't need to use a service
- flexible enough to work with whatever best practice emerges β ideally usable for any code that is stochastic enough to require scoring beyond passed/failed (that means LLM SDKs directly or even other agent frameworks)
- usable both for "offline evals" (unit-test style checks on performance) and "online evals" measuring performance in production or equivalent (presumably using an observability platform like Pydantic Logfire)
- usable with Pydantic Logfire when and where that actually helps
I believe @dmontagu has a plan.
SuperMuel, mikeedjones, Luca-Blight, slavakurilyak, lgesuellip and 6 moreMazyodFinndersen, Mazyod and baggiponte
Metadata
Metadata
Assignees
Labels
Feature requestNew feature requestNew feature request