A comprehensive implementation of state-of-the-art methods for detecting hallucinations in Large Language Model outputs. This toolkit reproduces and extends detection methodologies from research and industry, providing multiple approaches to identify factual errors, contradictions, and unsupported claims in AI-generated content.
This toolkit provides robust detection methods that work with any LLM-generated content, regardless of the underlying system architecture.
Hallucination in LLMs refers to the generation of content that appears plausible but is factually incorrect, unsupported by the input context, or entirely fabricated. This toolkit addresses the critical need for reliable hallucination detection across diverse applications and use cases.
- Trust & Safety: Ensure AI systems provide reliable, grounded information
- Quality Assurance: Maintain high standards in AI-powered applications
- Risk Mitigation: Prevent propagation of misinformation
- User Experience: Build confidence in AI-generated content
- Research & Development: Enable systematic evaluation of model improvements
- Compliance: Meet regulatory requirements for AI transparency
| Method | Description | Use Case | Accuracy | Speed | Status |
|---|---|---|---|---|---|
| LLM Judge | Uses judge LLMs for sentence-level grounding evaluation | General purpose, content verification | High | Medium | Complete |
Based on the AWS methodology and additional research:
| Method | Description | Source | Status | Expected Release |
|---|---|---|---|---|
| Embedding Similarity | Semantic similarity between context and response | AWS Blog | Planning | 2025 |
Choose the detection method that best fits your use case:
cd llm_judge
pip install -r requirements.txt
python main.py --input your_data.jsonl --provider openai --model gpt-4o-miniAll methods in this toolkit use a standardized input format for consistency:
{"id": "unique_id", "question": "original_question", "context": "source_context", "response": "llm_response"}This enables easy comparison and ensemble approaches across different detection methods.
- Python 3.8+
- Virtual environment (recommended)
git clone https://github.com/thatechmaestro/hallucination-detector.git
cd hallucination-detector
# Create virtual environment
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
# Install specific method
cd llm_judge
pip install -r requirements.txtBuilding Trust in AI through Rigorous Hallucination Detection