Skip to content

Commit

Permalink
fix typos
Browse files Browse the repository at this point in the history
  • Loading branch information
faneshion committed Mar 5, 2024
1 parent bc80273 commit 3dd7292
Showing 1 changed file with 3 additions and 3 deletions.
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,12 +12,12 @@ Rageval is a tool that helps you evaluate RAG system. The evaluation consists of

## Definition of tasks and metrics
### 1. [The generate task](./rageval/tasks/_generate.py)
The generate task is to answer the question based on the contexts provided by retrieval modules in RAG. Typically, the context could be extracted/generated text snippets from the compressor, or relevant documents from the re-ranker. Here, we divide the metrics used in the generate task into two categories, namely *answer correctness* and *answer groundedness*.
The generate task is to answer the question based on the contexts provided by retrieval modules in RAG. Typically, the context could be extracted/generated text snippets from the compressor, or relevant documents from the re-ranker. Here, we divide metrics used in the generate task into two categories, namely *answer correctness* and *answer groundedness*.

(1) **Answer Correctness**: this category of metrics is to evaluate the correctness by comparing the generated answer with the groundtruth answer. Here are some commonly used metrics:

* [Answer NLI Correctness](./rageval/metrics/_answer_claim_recall.py): also know as *claim recall* in [the paper (Tianyu et al.)](https://arxiv.org/abs/2305.14627).
* [Answer EM Correctness](./rageval/metrics/_answer_claim_recall.py): also know as *Exact Match* as used in [ASQA (Ivan Stelmakh et al.)](https://arxiv.org/abs/2204.06092).
* [Answer NLI Correctness](./rageval/metrics/_answer_claim_recall.py): also known as *claim recall* in [the paper (Tianyu et al.)](https://arxiv.org/abs/2305.14627).
* [Answer EM Correctness](./rageval/metrics/_answer_claim_recall.py): also known as *Exact Match* as used in the [ASQA paper (Ivan Stelmakh et al.)](https://arxiv.org/abs/2204.06092).

(2) **Answer Groundedness**: this category of metrics is to evaluate the groundedness (also known as factual consistency) by comparing the generated answer with the provided contexts. Here are some commonly used metrics:
* ~~answer_citation_precision ("answer_citation_precision")~~
Expand Down

0 comments on commit 3dd7292

Please sign in to comment.