UQLM: Uncertainty Quantification for Language Models, is a Python package for UQ-based LLM hallucination detection
-
Updated
Dec 21, 2025 - Python
UQLM: Uncertainty Quantification for Language Models, is a Python package for UQ-based LLM hallucination detection
benchmark and evaluate generative research synthesis
Code scanner to check for issues in prompts and LLM calls
Example Projects integrated with Future AGI Tech Stack for easy AI development
Comprehensive AI Evaluation Framework with advanced techniques including Temperature-Controlled Verdict Aggregation via Generalized Power Mean. Support for multiple LLM providers and 15+ evaluation metrics for RAG systems and AI agents.
Running UK AISI's Inspect in the Cloud
Cost-of-Pass: An Economic Framework for Evaluating Language Models
LLM-as-a-judge for Extractive QA datasets
🛡️ Safe AI Agents through Action Classifier
JudgeGPT: An empirical research platform for evaluating the authenticity of AI-generated news.
A Python library providing evaluation metrics to compare generated texts from LLMs, often against reference texts. Features streamlined workflows for model comparison and visualization.
Comprehensive AI Model Evaluation Framework with advanced techniques including Temperature-Controlled Verdict Aggregation via Generalized Power Mean. Support for multiple LLM providers and 15+ evaluation metrics for RAG systems and AI agents.
CLI tool to evaluate LLM factuality on MMLU benchmark.
Python SDK
A modular system for automated, multi-metric AI prompt evaluation—featuring expert models, an orchestrator, and a modern web UI.
VerifyAI is a simple UI application to test GenAI outputs
Emergent Computational Epistemology: studying AI’s emergent behaviors as non-human epistemic systems.
A2A version of Agent Action Guard: Safe AI Agents through Action Classifier
Pondera is a lightweight, YAML-first framework to evaluate AI models and agents with pluggable runners and an LLM-as-a-judge.
Assert-style validation library for AI outputs - ensure your LLMs behave exactly as expected.
Add a description, image, and links to the ai-evaluation topic page so that developers can more easily learn about it.
To associate your repository with the ai-evaluation topic, visit your repo's landing page and select "manage topics."