You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/concepts/components/eval_dataset.md
+1-1
Original file line number
Diff line number
Diff line change
@@ -20,7 +20,7 @@ An evaluation dataset consists of:
20
20
21
21
-**Define Clear Objectives**: Identify the specific aspects of the AI application that you want to evaluate and the scenarios you want to test. Collect data samples that reflect these objectives.
22
22
23
-
-**Collect Representative Data**: Ensure that the dataset covers a diverse range of scenarios, user inputs, and expected responses to provide a comprehensive evaluation of the AI application. This can be achieved by collecting data from various sources or [generating synthetic data]().
23
+
-**Collect Representative Data**: Ensure that the dataset covers a diverse range of scenarios, user inputs, and expected responses to provide a comprehensive evaluation of the AI application. This can be achieved by collecting data from various sources or [generating synthetic data](./../../howtos/customizations/index.md#testset-generation).
24
24
25
25
-**Quality and Size**: Aim for a dataset that is large enough to provide meaningful insights but not so large that it becomes unwieldy. Ensure that the data is of high quality and accurately reflects the real-world scenarios you want to evaluate.
Copy file name to clipboardExpand all lines: docs/concepts/index.md
+4-3
Original file line number
Diff line number
Diff line change
@@ -9,23 +9,24 @@
9
9
10
10
Discover the various components used within Ragas.
11
11
12
-
Components like [Prompt Object](components/index.md#prompt-object), [Evaluation Dataset](components/index.md#evaluation-dataset) and [more..](components/index.md)
12
+
Components like [Prompt Object](components/prompt.md), [Evaluation Dataset](components/eval_dataset.md) and [more..](components/index.md)
Explore available metrics and understand how they work.
19
20
20
-
Metrics for evaluating [RAG](metrics/index.md/#retrieval-augmented-generation), [Agentic workflows](metrics/index.md/#agents-or-tool-use-cases) and [more..](metrics/index.md/#list-of-available-metrics).
21
+
Metrics for evaluating [RAG](metrics/available_metrics/index.md#retrieval-augmented-generation), [Agentic workflows](metrics/available_metrics/index.md#agents-or-tool-use-cases) and [more..](metrics/available_metrics/index.md#list-of-available-metrics).
21
22
22
23
- :material-database-plus:{ .lg .middle } [__Test Data Generation__](test_data_generation/index.md)
23
24
24
25
---
25
26
26
27
Generate high-quality datasets for comprehensive testing.
27
28
28
-
Algorithms for synthesizing data to test [RAG](test_data_generation/index.md#retrieval-augmented-generation), [Agentic workflows](test_data_generation/index.md#agents-or-tool-use-cases)
29
+
Algorithms for synthesizing data to test [RAG](test_data_generation/rag.md), [Agentic workflows](test_data_generation/agents.md)
Copy file name to clipboardExpand all lines: docs/concepts/metrics/overview/index.md
+4-4
Original file line number
Diff line number
Diff line change
@@ -18,14 +18,14 @@ A metric is a quantitative measure used to evaluate the performance of a AI appl
18
18
19
19
**LLM-based metrics**: These metrics use LLM underneath to do the evaluation. There might be one or more LLM calls that are performed to arrive at the score or result. These metrics can be somewhat non deterministic as the LLM might not always return the same result for the same input. On the other hand, these metrics has shown to be more accurate and closer to human evaluation.
20
20
21
-
All LLM based metrics in ragas are inherited from `MetricWithLLM` class. These metrics expects a [LLM]() object to be set before scoring.
21
+
All LLM based metrics in ragas are inherited from `MetricWithLLM` class. These metrics expects a LLM object to be set before scoring.
22
22
23
23
```python
24
24
from ragas.metrics import FactualCorrectness
25
25
scorer = FactualCorrectness(llm=evaluation_llm)
26
26
```
27
27
28
-
Each LLM based metrics also will have prompts associated with it written using [Prompt Object]().
28
+
Each LLM based metrics also will have prompts associated with it written using [Prompt Object](./../../components/prompt.md).
29
29
30
30
31
31
**Non-LLM-based metrics**: These metrics do not use LLM underneath to do the evaluation. These metrics are deterministic and can be used to evaluate the performance of the AI application without using LLM. These metrics rely on traditional methods to evaluate the performance of the AI application, such as string similarity, BLEU score, etc. Due to the same, these metrics are known to have a lower correlation with human evaluation.
@@ -34,7 +34,7 @@ All LLM based metrics in ragas are inherited from `Metric` class.
34
34
35
35
**Metrics can be broadly classified into two categories based on the type of data they evaluate**:
36
36
37
-
**Single turn metrics**: These metrics evaluate the performance of the AI application based on a single turn of interaction between the user and the AI. All metrics in ragas that supports single turn evaluation are inherited from `SingleTurnMetric` class and scored using `single_turn_ascore` method. It also expects a [Single Turn Sample]() object as input.
37
+
**Single turn metrics**: These metrics evaluate the performance of the AI application based on a single turn of interaction between the user and the AI. All metrics in ragas that supports single turn evaluation are inherited from [SingleTurnMetric][ragas.metrics.base.SingleTurnMetric] class and scored using `single_turn_ascore` method. It also expects a [Single Turn Sample][ragas.dataset_schema.SingleTurnSample] object as input.
38
38
39
39
```python
40
40
from ragas.metrics import FactualCorrectness
@@ -43,7 +43,7 @@ scorer = FactualCorrectness()
43
43
await scorer.single_turn_ascore(sample)
44
44
```
45
45
46
-
**Multi-turn metrics**: These metrics evaluate the performance of the AI application based on multiple turns of interaction between the user and the AI. All metrics in ragas that supports multi turn evaluation are inherited from `MultiTurnMetric` class and scored using `multi_turn_ascore` method. It also expects a [Multi Turn Sample]() object as input.
46
+
**Multi-turn metrics**: These metrics evaluate the performance of the AI application based on multiple turns of interaction between the user and the AI. All metrics in ragas that supports multi turn evaluation are inherited from [MultiTurnMetric][ragas.metrics.base.MultiTurnMetric] class and scored using `multi_turn_ascore` method. It also expects a [Multi Turn Sample][ragas.dataset_schema.MultiTurnSample] object as input.
Copy file name to clipboardExpand all lines: docs/concepts/test_data_generation/rag.md
+2-1
Original file line number
Diff line number
Diff line change
@@ -103,7 +103,7 @@ graph TD
103
103
104
104
### Extractors
105
105
106
-
Different extractors are used to extract information from each nodes that can be used to establish the relationship between the nodes. For example, in the case of financial documents, the extractor that can be used are entity extractor to extract the entities like Company Name, Keyphrase extractor to extract important key phrases present in each node, etc. You can write your own [custom extractors]() to extract the information that is relevant to your domain.
106
+
Different extractors are used to extract information from each nodes that can be used to establish the relationship between the nodes. For example, in the case of financial documents, the extractor that can be used are entity extractor to extract the entities like Company Name, Keyphrase extractor to extract important key phrases present in each node, etc. You can write your own custom extractors to extract the information that is relevant to your domain.
107
107
108
108
Extractors can be LLM based which are inherited from `LLMBasedExtractor` or rule based which are inherited from `Extractor`.
109
109
@@ -165,6 +165,7 @@ graph TD
165
165
166
166
The extracted information is used to establish the relationship between the nodes. For example, in the case of financial documents, the relationship can be established between the nodes based on the entities present in the nodes.
167
167
You can write your own [custom relationship builder]() to establish the relationship between the nodes based on the information that is relevant to your domain.
For more information on how to use LlamaIndex, please refer to the [LlamaIndex Integration guide](../../howtos/integrations/_llamaindex/).
138
+
For more information on how to use LlamaIndex, please refer to the [LlamaIndex Integration guide](./../../howtos/integrations/_llamaindex.md).
139
139
140
140
If your still not able use Ragas with your favorite LLM provider, please let us know by by commenting on this [issue](https://github.com/explodinggradients/ragas/issues/1617) and we'll add support for it 🙂.
For more information on how to use LlamaIndex, please refer to the [LlamaIndex Integration guide](../../howtos/integrations/_llamaindex/).
137
+
For more information on how to use LlamaIndex, please refer to the [LlamaIndex Integration guide](./../../howtos/integrations/_llamaindex.md).
138
138
139
139
If your still not able use Ragas with your favorite LLM provider, please let us know by by commenting on this [issue](https://github.com/explodinggradients/ragas/issues/1617) and we'll add support for it 🙂.
Copy file name to clipboardExpand all lines: docs/getstarted/evals.md
+2-2
Original file line number
Diff line number
Diff line change
@@ -7,7 +7,7 @@ The purpose of this guide is to illustrate a simple workflow for testing and eva
7
7
8
8
In this guide, you will evaluate a **text summarization pipeline**. The goal is to ensure that the output summary accurately captures all the key details specified in the text, such as growth figures, market insights, and other essential information.
9
9
10
-
`ragas` offers a variety of methods for analyzing the performance of LLM applications, referred to as [metrics](../concepts/metrics/). Each metric requires a predefined set of data points, which it uses to calculate scores that indicate performance.
10
+
`ragas` offers a variety of methods for analyzing the performance of LLM applications, referred to as [metrics](../concepts/metrics/available_metrics/index.md). Each metric requires a predefined set of data points, which it uses to calculate scores that indicate performance.
11
11
12
12
### Evaluating using a Non-LLM Metric
13
13
@@ -203,7 +203,7 @@ To fix these results, ragas provides a way to align the metric with your prefere
203
203
2.**Download**: Save the annotated data using the `Annotated JSON` button in [app.ragas.io](https://app.ragas.io/).
204
204
3.**Train**: Use the annotated data to train your custom metric.
205
205
206
-
To learn more about this, refer to how to [train your own metric guide](../howtos/customizations/metrics/train_your_own_metric.md)
206
+
To learn more about this, refer to how to [train your own metric guide](./../howtos/customizations/metrics/train_your_own_metric.md)
We have successfully collected the evaluation data. Now, we can evaluate our RAG system on the collected dataset using a set of commonly used RAG evaluation metrics. You may choose any model as [evaluator LLM](/docs/howtos/customizations/customize_models.md) for evaluation.
160
+
We have successfully collected the evaluation data. Now, we can evaluate our RAG system on the collected dataset using a set of commonly used RAG evaluation metrics. You may choose any model as [evaluator LLM](./../howtos/customizations/customize_models.md) for evaluation.
Copy file name to clipboardExpand all lines: docs/getstarted/rag_testset_generation.md
+13-6
Original file line number
Diff line number
Diff line change
@@ -31,7 +31,7 @@ docs = loader.load()
31
31
32
32
### Choose your LLM
33
33
34
-
You may choose to use any [LLM of your choice](../howtos/customizations/customize_models.md)
34
+
You may choose to use any [LLM of your choice](./../howtos/customizations/customize_models.md)
35
35
--8<--
36
36
choose_generator_llm.md
37
37
--8<--
@@ -55,9 +55,10 @@ Once you have generated a testset, you would want to view it and select the quer
55
55
dataset.to_pandas()
56
56
```
57
57
58
+
Output
58
59

59
60
60
-
You can also use other tools like [app.ragas.io](https://app.ragas.io/) or any other similar tools available for you in the [Integrations](../howtos/integrations/index.md) section.
61
+
You can also use other tools like [app.ragas.io](https://app.ragas.io/) or any other similar tools available for you in the [Integrations](./../howtos/integrations/index.md) section.
61
62
62
63
In order to use the [app.ragas.io](https://app.ragas.io/) dashboard, you need to have an account on [app.ragas.io](https://app.ragas.io/). If you don't have one, you can sign up for one [here](https://app.ragas.io/login). You will also need to have a [Ragas APP token](https://app.ragas.io/settings/api-keys).
63
64
@@ -93,6 +94,7 @@ from ragas.testset.graph import KnowledgeGraph
You can define your own or import parsers if they are defined. If you would like to suggest parser for LLM providers or contribute your own ones please check out this [issue](https://github.com/explodinggradients/ragas/issues/1151) 🙂.
0 commit comments