You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/getstarted/evaluation.md
+6-6
Original file line number
Diff line number
Diff line change
@@ -28,14 +28,14 @@ While originally ragas was aimed at `ground_truth` free evaluations there is som
28
28
```
29
29
30
30
Hence to work with ragas all you need are the following data
31
-
- question: `list[str]` - These are the questions you RAG pipeline will be evaluated on.
32
-
- answer: `list[str]` - The answer generated from the RAG pipeline and give to the user.
33
-
- contexts: `list[list[str]]` - The contexts which where passed into the LLM to answer the question.
31
+
- question: `list[str]` - These are the questions your RAG pipeline will be evaluated on.
32
+
- answer: `list[str]` - The answer generated from the RAG pipeline and given to the user.
33
+
- contexts: `list[list[str]]` - The contexts which were passed into the LLM to answer the question.
34
34
- ground_truths: `list[list[str]]` - The ground truth answer to the questions. (only required if you are using context_recall)
35
35
36
36
Ideally your list of questions should reflect the questions your users give, including those that you have been problematic in the past.
37
37
38
-
Here we're using an example dataset from on of the baselines we created for the [Financial Opinion Mining and Question Answering (fiqa) Dataset](https://sites.google.com/view/fiqa/) we created.
38
+
Here we're using an example dataset from on of the baselines we created for the [Financial Opinion Mining and Question Answering (fiqa) Dataset](https://sites.google.com/view/fiqa/) we created.
39
39
40
40
41
41
```{code-block} python
@@ -54,7 +54,7 @@ See [prepare-data](/docs/concepts/prepare_data.md) to learn how to prepare your
54
54
55
55
Ragas provides you with a few metrics to evaluate the different aspects of your RAG systems namely
56
56
57
-
1. Retriever: offers `context_precision` and `context_recall` which give you the measure of the performance of your retrieval system.
57
+
1. Retriever: offers `context_precision` and `context_recall` which give you the measure of the performance of your retrieval system.
58
58
2. Generator (LLM): offers `faithfulness` which measures hallucinations and `answer_relevancy` which measures how to the point the answers are to the question.
59
59
60
60
The harmonic mean of these 4 aspects gives you the **ragas score** which is a single measure of the performance of your QA system across all the important aspects.
@@ -75,7 +75,7 @@ here you can see that we are using 4 metrics, but what do the represent?
75
75
1. faithfulness - the factual consistency of the answer to the context base on the question.
76
76
2. context_precision - a measure of how relevant the retrieved context is to the question. Conveys quality of the retrieval pipeline.
77
77
3. answer_relevancy - a measure of how relevant the answer is to the question
78
-
4. context_recall: measures the ability of the retriever to retrieve all the necessary information needed to answer the question.
78
+
4. context_recall: measures the ability of the retriever to retrieve all the necessary information needed to answer the question.
0 commit comments