VerifAI initiative to build open-source easy-to-deploy generative question-answering engine that can reference and verify answers for correctness (using posteriori model)
-
Updated
Nov 14, 2024 - Jupyter Notebook
VerifAI initiative to build open-source easy-to-deploy generative question-answering engine that can reference and verify answers for correctness (using posteriori model)
This repo hosts the Python SDK and related examples for AIMon, which is a proprietary, state-of-the-art system for detecting LLM quality issues such as Hallucinations. It can be used during offline evals, continuous monitoring or inline detection. We offer various model quality metrics that are fast, reliable and cost-effective.
[ACL 2024] User-friendly evaluation framework: Eval Suite & Benchmarks: UHGEval, HaluEval, HalluQA, etc.
PAS2: A Python-based hallucination detection system that evaluates AI response consistency through paraphrasing and semantic similarity analysis. Features include response evaluation, similarity metrics, visualization tools, and a web interface for interactive testing.
up-to-date curated list of state-of-the-art Large vision language models hallucinations research work, papers & resources
Fact-checking with Iterative Retrieval and Verification
[ACL 2024] ANAH & [NeurIPS 2024] ANAH-v2
Binary hallucination detection classifier using logistic regression
Fully automated LLM evaluator
[ACL 2024] An Easy-to-use Hallucination Detection Framework for LLMs.
Hallucination in Chat-bots: Faithful Benchmark for Information-Seeking Dialogue
Chrome extension for the ATLAS project.
🔢Hallucination detector for Large Language Models.
API for the atlas project
UpTrain is an open-source unified platform to evaluate and improve Generative AI applications. We provide grades for 20+ preconfigured checks (covering language, code, embedding use-cases), perform root cause analysis on failure cases and give insights on how to resolve them.
Code for the EMNLP 2024 paper "Detecting and Mitigating Contextual Hallucinations in Large Language Models Using Only Attention Maps"
Different approaches to evaluate RAG !!!
Official repo for the paper PHUDGE: Phi-3 as Scalable Judge. Evaluate your LLMs with or without custom rubric, reference answer, absolute, relative and much more. It contains a list of all the available tool, methods, repo, code etc to detect hallucination, LLM evaluation, grading and much more.
Detecting Hallucinations in Large Language Model Generations using Graph Structures
Competition: SemEval-2024 Task-6 - SHROOM, a Shared-task on Hallucinations and Related Observable Overgeneration Mistakes
Add a description, image, and links to the hallucination-detection topic page so that developers can more easily learn about it.
To associate your repository with the hallucination-detection topic, visit your repo's landing page and select "manage topics."