An implementation of the Anthropic's paper and essay on "A statistical approach to model evaluations"
-
Updated
Oct 6, 2025 - Python
An implementation of the Anthropic's paper and essay on "A statistical approach to model evaluations"
Create your self-hosted, open-source Operator model.
๐ ๐๐ถ๐ญ๐ต๐ช-๐๐จ๐ฆ๐ฏ๐ต ๐๐บ๐ด๐ต๐ฆ๐ฎ ๐ง๐ฐ๐ณ ๐๐ณ๐ฐ๐ด๐ด-๐๐ฉ๐ฆ๐ค๐ฌ๐ช๐ฏ๐จ ๐๐ฉ๐ช๐ด๐ฉ๐ช๐ฏ๐จ ๐๐๐๐ด.
Add a description, image, and links to the agent-evals topic page so that developers can more easily learn about it.
To associate your repository with the agent-evals topic, visit your repo's landing page and select "manage topics."