-
Notifications
You must be signed in to change notification settings - Fork 447
chore(llmobs): support running experiment evals #13994
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Bootstrap import analysisComparison of import times between this PR and base. SummaryThe average import time from this PR is: 284 ± 4 ms. The average import time from base is: 288 ± 4 ms. The import time difference between this PR and base is: -3.3 ± 0.2 ms. Import time breakdownThe following import paths have shrunk:
|
BenchmarksBenchmark execution time: 2025-07-15 02:19:22 Comparing candidate commit 25cf67f in PR branch Found 0 performance improvements and 0 performance regressions! Performance is the same for 548 metrics, 2 unstable metrics. |
cbf3ce6
to
269dc00
Compare
MLOB-3269
Adds support for running experiment evaluator functions, and merging task/evaluator results into one object.
Writing result objects to datadog will come in a future PR.
Checklist
Reviewer Checklist