🐢 Open-Source Evaluation & Testing library for LLM Agents
-
Updated
Nov 18, 2025 - Python
🐢 Open-Source Evaluation & Testing library for LLM Agents
Evaluation and Tracking for LLM Experiments and AI Agents
Analytics, Versioning and ETL for multimodal data: video, audio, PDFs, images
UpTrain is an open-source unified platform to evaluate and improve Generative AI applications. We provide grades for 20+ preconfigured checks (covering language, code, embedding use-cases), perform root cause analysis on failure cases and give insights on how to resolve them.
Python SDK for running evaluations on LLM generated responses
A simple GPT-based evaluation tool for multi-aspect, interpretable assessment of LLMs.
Python SDK for experimenting, testing, evaluating & monitoring LLM-powered applications - Parea AI (YC S23)
llm-eval-simple is a simple LLM evaluation framework with intermediate actions and prompt pattern selection
First-of-its-kind AI benchmark for evaluating the protection capabilities of large language model (LLM) guard systems (guardrails and safeguards)
Develop reliable AI apps
🎯 Your free LLM evaluation toolkit helps you assess the accuracy of facts, how well it understands context, its tone, and more. This helps you see how good your LLM applications are.
An open source library for asynchronous querying of LLM endpoints
LLM Security Platform.
Structured output benchmarks comparing DSPy and BAML with different LLMs
Realign is a testing and simulation framework for AI applications.
[ACL 2025] GuessArena: Guess Who I Am? A Self-Adaptive Framework for Evaluating LLMs in Domain-Specific Knowledge and Reasoning
The prompt engineering, prompt management, and prompt evaluation tool for Python
Generative agents — computational software agents that simulate believable human behavior and OpenAI LLM models. Our main focus was to develop a game - “Werewolves of Miller’s Hollow”, aiming to replicate human-like behavior.
Add a description, image, and links to the llm-eval topic page so that developers can more easily learn about it.
To associate your repository with the llm-eval topic, visit your repo's landing page and select "manage topics."