Welcome to the simple LLM evaluation frameworkโsimpleval, for short.
simpleval is a Python package designed to make evaluating Large Language Models (LLMs) easier, using the "LLM as a Judge" technique.
It supports a variety of LLM providers, including OpenAI, Google (Gemini API, Vertex), AWS Bedrock, Anthropic, Azure, and more (via LiteLLM).
simpleval also includes several reports to help you analyze, compare, and summarize your evaluation results. See the available reports for more details.
See the ๐ Quickstart Guide ๐
See ๐ Project Documentation ๐
We appreciate your help in making this project better! โจ
If you would like to contribute to this project, please follow the guidelines outlined in the CONTRIBUTING.md file.
simpleval is released under the Apache License. See the LICENSE file for more details.
If you have any questions or suggestions, feel free to join our GitHub discussions forum ๐ฌ
If you want to report a bug or request a feature, please open an issue in the GitHub issues tracker ๐