Skip to content

Latest commit

 

History

History
58 lines (40 loc) · 2.05 KB

README.md

File metadata and controls

58 lines (40 loc) · 2.05 KB

Rageval

Evaluation tools for Retrieval-augmented Generation (RAG) methods.

python workflow status codecov pydocstyle PEP8

Rageval is a tool that helps you evaluate RAG system. The evaluation consists of six sub-tasks, including query rewriting, document ranking, information compression, evidence verify, answer generating, and result validating.

Definition of tasks and metrics

Generator

After obtaining relevant documents pieces, the generator is tasked with answer the question by utilizing the original user query and the retrieved contexts. We assess a generator module from two distinct perspectives. (1) Answer Correctness In this task, we compare output answer with groundtruth answer using following metrics.

  • answer claim recall ("answer_claim_recall")
  • answer exact match ("answer_exact_match")
  • context reject rate ("context_reject_rate") (2) Answer Groundednedd
  • answer_citation_precision ("answer_citation_precision")
  • answer_citation_recall ("answer_citation_recall")

Rewriter

xxx

Installation

git clone https://github.com/gomate-community/rageval.git
cd rageval
python setup.py install

Usage

import rageval as rl

test_set = rl.datasets.load_data('ALCE', task='')
metric = rl.metrics.ContextRecall()
model = rl.models.OpenAILLM()
metric.init_model(model)

results = metric._score_batch(teset_set)

Contribution

Please make sure to read the Contributing Guide before creating a pull request.

About

This project is currently at its preliminary stage.