Evaluation tools for Retrieval-augmented Generation (RAG) methods.
Rageval is a tool that helps you evaluate RAG system. The evaluation consists of six sub-tasks, including query rewriting, document ranking, information compression, evidence verify, answer generating, and result validating.
After obtaining relevant documents pieces, the generator is tasked with answer the question by utilizing the original user query and the retrieved contexts. We assess a generator module from two distinct perspectives. (1) Answer Correctness In this task, we compare output answer with groundtruth answer using following metrics.
- answer claim recall ("answer_claim_recall")
- answer exact match ("answer_exact_match")
- context reject rate ("context_reject_rate") (2) Answer Groundednedd
- answer_citation_precision ("answer_citation_precision")
- answer_citation_recall ("answer_citation_recall")
xxx
git clone https://github.com/gomate-community/rageval.git
cd rageval
python setup.py install
import rageval as rl
test_set = rl.datasets.load_data('ALCE', task='')
metric = rl.metrics.ContextRecall()
model = rl.models.OpenAILLM()
metric.init_model(model)
results = metric._score_batch(teset_set)
Please make sure to read the Contributing Guide before creating a pull request.
This project is currently at its preliminary stage.