rag-evaluation

Here are 28 public repositories matching this topic...

Giskard-AI / giskard-oss

🐢 Open-Source Evaluation & Testing library for LLM Agents

ai-security mlops fairness-ai responsible-ai ml-validation red-team-tools trustworthy-ai ml-testing llm ai-red-team ai-testing llmops llm-security llm-eval llm-evaluation rag-evaluation agent-evaluation

Updated Nov 18, 2025
Python

Marker-Inc-Korea / AutoRAG

Star

AutoRAG: An Open-Source Framework for Retrieval-Augmented Generation (RAG) Evaluation & Optimization with AutoML-Style Automation

python open-source qa benchmarking ops pipeline analysis optimization evaluation embeddings automl document-parser rag llm retrieval-augmented-generation llm-ops llm-evaluation rag-evaluation

Updated Nov 20, 2025
Python

Agenta-AI / agenta

Star

The open-source LLMOps platform: prompt playground, prompt management, LLM evaluation, and LLM observability all in one place.

prompt-engineering prompt-management llm-tools llm-framework llm-playground llm-platform llm-evaluation rag-evaluation llm-monitoring llm-as-a-judge llm-observability llmops-platform

Updated Nov 23, 2025
Python

vectara / open-rag-eval

Star

RAG evaluation without the need for "golden answers"

metrics evaluation-metrics rag vectara retrieval-augmented-generation rag-evaluation

Updated Nov 23, 2025
Python

LLAMATOR-Core / llamator

Star

Framework for testing vulnerabilities of large language models (LLM).

Updated Sep 24, 2025
Python

mburaksayici / RAG-Boilerplate

Star

RAG boilerplate with semantic/propositional chunking, hybrid search (BM25 + dense), LLM reranking, query enhancement agents, CrewAI orchestration, Qdrant vector search, Redis/Mongo sessioning, Celery ingestion pipeline, Gradio UI, and an evaluation suite (Hit-Rate, MRR, hybrid configs).

ai-agents reranking rag vector-database hybrid-search qdrant llm retrieval-augmented-generation rag-evaluation semantic-chunking crewai rag-pipeline propositional-models query-enhancement

Updated Nov 18, 2025
Python

mts-ai / rurage

Star

information-retrieval question-answering rag llm-evaluation rag-evaluation

Updated Apr 14, 2025
Python

vero-labs-ai / vero-eval

Star

Open source framework for evaluating AI Agents

python testing evaluation datasets dataset-generation evaluation-metrics evaluation-framework testing-framework testing-library synthetic-dataset-generation user-persona evals llm-evaluation rag-evaluation llm-evaluation-framework langgraph rag-testing

Updated Nov 23, 2025
Python

oztrkoguz / RAG-Framework-Evaluation

Star

This project aims to compare different Retrieval-Augmented Generation (RAG) frameworks in terms of speed and performance.

swarms autogen rag langchain llamaindex rag-evaluation crewai langchain-rag autogen-rag crewai-rag llamaindex-rag swarms-rag

Updated Jul 28, 2024
Python

ioannis-papadimitriou / rag-playground

Star

A framework for systematic evaluation of retrieval strategies and prompt engineering in RAG systems, featuring an interactive chat interface for document analysis.

chatbot qa-generation llm-inference retrieval-augmented-generation rag-evaluation

Updated Dec 18, 2024
Python

rostyslavshovak / RAG-Retrieval-Augmented-Generation

Star

RAG Chatbot for Financial Analysis

open-source pdf rag gradio-interface langchain qdrant-vector-database retrieval-augmented-generation rag-evaluation

Updated Mar 9, 2025
Python

bluewave-labs / evalwise

Sponsor

Star

EvalWise is a developer-friendly platform for LLM evaluation and red teaming that helps test AI models for safety, compliance, and performance issues

rag llm prompt-engineering llmops prompt-testing evals llm-evaluation rag-evaluation llm-evaluation-toolkit

Updated Nov 20, 2025
Python

shaadclt / EvalRAG

Sponsor

Star

A comprehensive evaluation toolkit for assessing Retrieval-Augmented Generation (RAG) outputs using linguistic, semantic, and fairness metrics

rag rag-evaluation

Updated Apr 19, 2025
Python

fkapsahili / EntRAG

Star

EntRAG - Enterprise RAG Benchmark

benchmark evaluations retrieval evaluation dataset knowledge-graph rag llm generative-ai retrieval-augmented-generation llm-evaluation rag-evaluation

Updated Jun 10, 2025
Python

Responsible-AI-Labs / rail-score

Star

Python SDK

python machine-learning sdk artificial-intelligence compliance gdpr ai-safety ai-ethics content-moderation responsible-ai ai-evaluation llm rag-evaluation rail-score

Updated Nov 3, 2025
Python

AnasAber / MLflow_with_RAG

Star

Using MLflow to deploy your RAG pipeline, using LLamaIndex, Langchain and Ollama/HuggingfaceLLMs/Groq

deployment cicd evaluation-metrics rag mlops mlflow mlflow-tracking-server mlflow-tracking mlflow-projects mlflow-ui mlops-template mlops-project llamaindex rag-evaluation rag-pipeline llamaindex-rag mlflow-deployement

Updated Jan 20, 2025
Python

Kaos599 / BetterRAG

Star

BetterRAG: Powerful RAG evaluation toolkit for LLMs. Measure, analyze, and optimize how your AI processes text chunks with precision metrics. Perfect for RAG systems, document processing, and embedding quality assessment.

optimization evaluation embeddings evaluation-framework rag embeddings-extraction rag-evaluation rag-application rag-optimization chunking-optimization embeddings-optimization

Updated Mar 26, 2025
Python

sprakash21 / aws-genai-rageval-bot

Star

RAG Pipeline Evaluation and monitoring on AWS using RAGAS

monitoring rag-evaluation genai-chatbot ragas

Updated Oct 16, 2024
Python

vashishthdoshi / data-analytics-in-politics-learningportal

Star

Advanced Retrieval-Augmented Generation (RAG) system designed as an interactive learning portal for political analytics.

political political-science rag learning-portal rag-evaluation rag-pipeline

Updated Oct 13, 2025
Python

alinaleo27 / ai-rag-eval-qa

Star

AI RAG evaluation project using Ragas. Includes RAG metrics (precision, recall, faithfulness), retrieval diagnostics, and prompt testing examples for fintech/banking LLM systems. Designed as an AI QA Specialist portfolio project.

ai-qa prompt-testing llm-evaluation rag-evaluation ragas llm-testing

Updated Nov 17, 2025
Python

Improve this page

Add a description, image, and links to the rag-evaluation topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the rag-evaluation topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rag-evaluation

Here are 28 public repositories matching this topic...

Giskard-AI / giskard-oss

Marker-Inc-Korea / AutoRAG

Agenta-AI / agenta

vectara / open-rag-eval

LLAMATOR-Core / llamator

mburaksayici / RAG-Boilerplate

mts-ai / rurage

vero-labs-ai / vero-eval

oztrkoguz / RAG-Framework-Evaluation

ioannis-papadimitriou / rag-playground

rostyslavshovak / RAG-Retrieval-Augmented-Generation

bluewave-labs / evalwise

shaadclt / EvalRAG

fkapsahili / EntRAG

Responsible-AI-Labs / rail-score

AnasAber / MLflow_with_RAG

Kaos599 / BetterRAG

sprakash21 / aws-genai-rageval-bot

vashishthdoshi / data-analytics-in-politics-learningportal

alinaleo27 / ai-rag-eval-qa

Improve this page

Add this topic to your repo