Skip to content

Commit 91393e6

Browse files
docs: Fixed most of the broken links (#1830)
1 parent 6478a6e commit 91393e6

25 files changed

+663
-278
lines changed

docs/concepts/components/eval_dataset.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ An evaluation dataset consists of:
2020

2121
- **Define Clear Objectives**: Identify the specific aspects of the AI application that you want to evaluate and the scenarios you want to test. Collect data samples that reflect these objectives.
2222

23-
- **Collect Representative Data**: Ensure that the dataset covers a diverse range of scenarios, user inputs, and expected responses to provide a comprehensive evaluation of the AI application. This can be achieved by collecting data from various sources or [generating synthetic data]().
23+
- **Collect Representative Data**: Ensure that the dataset covers a diverse range of scenarios, user inputs, and expected responses to provide a comprehensive evaluation of the AI application. This can be achieved by collecting data from various sources or [generating synthetic data](./../../howtos/customizations/index.md#testset-generation).
2424

2525
- **Quality and Size**: Aim for a dataset that is large enough to provide meaningful insights but not so large that it becomes unwieldy. Ensure that the data is of high quality and accurately reflects the real-world scenarios you want to evaluate.
2626

docs/concepts/index.md

+4-3
Original file line numberDiff line numberDiff line change
@@ -9,23 +9,24 @@
99

1010
Discover the various components used within Ragas.
1111

12-
Components like [Prompt Object](components/index.md#prompt-object), [Evaluation Dataset](components/index.md#evaluation-dataset) and [more..](components/index.md)
12+
Components like [Prompt Object](components/prompt.md), [Evaluation Dataset](components/eval_dataset.md) and [more..](components/index.md)
13+
1314

1415
- ::material-ruler-square:{ .lg .middle } [__Ragas Metrics__](metrics/index.md)
1516

1617
---
1718

1819
Explore available metrics and understand how they work.
1920

20-
Metrics for evaluating [RAG](metrics/index.md/#retrieval-augmented-generation), [Agentic workflows](metrics/index.md/#agents-or-tool-use-cases) and [more..](metrics/index.md/#list-of-available-metrics).
21+
Metrics for evaluating [RAG](metrics/available_metrics/index.md#retrieval-augmented-generation), [Agentic workflows](metrics/available_metrics/index.md#agents-or-tool-use-cases) and [more..](metrics/available_metrics/index.md#list-of-available-metrics).
2122

2223
- :material-database-plus:{ .lg .middle } [__Test Data Generation__](test_data_generation/index.md)
2324

2425
---
2526

2627
Generate high-quality datasets for comprehensive testing.
2728

28-
Algorithms for synthesizing data to test [RAG](test_data_generation/index.md#retrieval-augmented-generation), [Agentic workflows](test_data_generation/index.md#agents-or-tool-use-cases)
29+
Algorithms for synthesizing data to test [RAG](test_data_generation/rag.md), [Agentic workflows](test_data_generation/agents.md)
2930

3031

3132
- :material-chart-box-outline:{ .lg .middle } [__Feedback Intelligence__](feedback/index.md)

docs/concepts/metrics/overview/index.md

+4-4
Original file line numberDiff line numberDiff line change
@@ -18,14 +18,14 @@ A metric is a quantitative measure used to evaluate the performance of a AI appl
1818

1919
     **LLM-based metrics**: These metrics use LLM underneath to do the evaluation. There might be one or more LLM calls that are performed to arrive at the score or result. These metrics can be somewhat non deterministic as the LLM might not always return the same result for the same input. On the other hand, these metrics has shown to be more accurate and closer to human evaluation.
2020

21-
All LLM based metrics in ragas are inherited from `MetricWithLLM` class. These metrics expects a [LLM]() object to be set before scoring.
21+
All LLM based metrics in ragas are inherited from `MetricWithLLM` class. These metrics expects a LLM object to be set before scoring.
2222

2323
```python
2424
from ragas.metrics import FactualCorrectness
2525
scorer = FactualCorrectness(llm=evaluation_llm)
2626
```
2727

28-
Each LLM based metrics also will have prompts associated with it written using [Prompt Object]().
28+
Each LLM based metrics also will have prompts associated with it written using [Prompt Object](./../../components/prompt.md).
2929

3030

3131
     **Non-LLM-based metrics**: These metrics do not use LLM underneath to do the evaluation. These metrics are deterministic and can be used to evaluate the performance of the AI application without using LLM. These metrics rely on traditional methods to evaluate the performance of the AI application, such as string similarity, BLEU score, etc. Due to the same, these metrics are known to have a lower correlation with human evaluation.
@@ -34,7 +34,7 @@ All LLM based metrics in ragas are inherited from `Metric` class.
3434

3535
**Metrics can be broadly classified into two categories based on the type of data they evaluate**:
3636

37-
     **Single turn metrics**: These metrics evaluate the performance of the AI application based on a single turn of interaction between the user and the AI. All metrics in ragas that supports single turn evaluation are inherited from `SingleTurnMetric` class and scored using `single_turn_ascore` method. It also expects a [Single Turn Sample]() object as input.
37+
     **Single turn metrics**: These metrics evaluate the performance of the AI application based on a single turn of interaction between the user and the AI. All metrics in ragas that supports single turn evaluation are inherited from [SingleTurnMetric][ragas.metrics.base.SingleTurnMetric] class and scored using `single_turn_ascore` method. It also expects a [Single Turn Sample][ragas.dataset_schema.SingleTurnSample] object as input.
3838

3939
```python
4040
from ragas.metrics import FactualCorrectness
@@ -43,7 +43,7 @@ scorer = FactualCorrectness()
4343
await scorer.single_turn_ascore(sample)
4444
```
4545

46-
     **Multi-turn metrics**: These metrics evaluate the performance of the AI application based on multiple turns of interaction between the user and the AI. All metrics in ragas that supports multi turn evaluation are inherited from `MultiTurnMetric` class and scored using `multi_turn_ascore` method. It also expects a [Multi Turn Sample]() object as input.
46+
     **Multi-turn metrics**: These metrics evaluate the performance of the AI application based on multiple turns of interaction between the user and the AI. All metrics in ragas that supports multi turn evaluation are inherited from [MultiTurnMetric][ragas.metrics.base.MultiTurnMetric] class and scored using `multi_turn_ascore` method. It also expects a [Multi Turn Sample][ragas.dataset_schema.MultiTurnSample] object as input.
4747

4848
```python
4949
from ragas.metrics import AgentGoalAccuracy

docs/concepts/test_data_generation/rag.md

+2-1
Original file line numberDiff line numberDiff line change
@@ -103,7 +103,7 @@ graph TD
103103

104104
### Extractors
105105

106-
Different extractors are used to extract information from each nodes that can be used to establish the relationship between the nodes. For example, in the case of financial documents, the extractor that can be used are entity extractor to extract the entities like Company Name, Keyphrase extractor to extract important key phrases present in each node, etc. You can write your own [custom extractors]() to extract the information that is relevant to your domain.
106+
Different extractors are used to extract information from each nodes that can be used to establish the relationship between the nodes. For example, in the case of financial documents, the extractor that can be used are entity extractor to extract the entities like Company Name, Keyphrase extractor to extract important key phrases present in each node, etc. You can write your own custom extractors to extract the information that is relevant to your domain.
107107

108108
Extractors can be LLM based which are inherited from `LLMBasedExtractor` or rule based which are inherited from `Extractor`.
109109

@@ -165,6 +165,7 @@ graph TD
165165

166166
The extracted information is used to establish the relationship between the nodes. For example, in the case of financial documents, the relationship can be established between the nodes based on the entities present in the nodes.
167167
You can write your own [custom relationship builder]() to establish the relationship between the nodes based on the information that is relevant to your domain.
168+
# Link missing above
168169

169170
#### Example
170171

docs/extra/components/choose_evaluator_llm.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -126,7 +126,7 @@
126126
evaluator_llm = LangchainLLMWrapper(your_llm_instance)
127127
```
128128

129-
For a more detailed guide, checkout [the guide on customizing models](../../howtos/customizations/customize_models/).
129+
For a more detailed guide, checkout [the guide on customizing models](../../howtos/customizations/customize_models.md).
130130

131131
If you using LlamaIndex, you can use the `LlamaIndexLLMWrapper` to wrap your LLM so that it can be used with ragas.
132132

@@ -135,6 +135,6 @@
135135
evaluator_llm = LlamaIndexLLMWrapper(your_llm_instance)
136136
```
137137

138-
For more information on how to use LlamaIndex, please refer to the [LlamaIndex Integration guide](../../howtos/integrations/_llamaindex/).
138+
For more information on how to use LlamaIndex, please refer to the [LlamaIndex Integration guide](./../../howtos/integrations/_llamaindex.md).
139139

140140
If your still not able use Ragas with your favorite LLM provider, please let us know by by commenting on this [issue](https://github.com/explodinggradients/ragas/issues/1617) and we'll add support for it 🙂.

docs/extra/components/choose_generator_llm.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -125,7 +125,7 @@
125125
generator_llm = LangchainLLMWrapper(your_llm_instance)
126126
```
127127

128-
For a more detailed guide, checkout [the guide on customizing models](../../howtos/customizations/customize_models/).
128+
For a more detailed guide, checkout [the guide on customizing models](../../howtos/customizations/customize_models.md).
129129

130130
If you using LlamaIndex, you can use the `LlamaIndexLLMWrapper` to wrap your LLM so that it can be used with ragas.
131131

@@ -134,6 +134,6 @@
134134
generator_llm = LlamaIndexLLMWrapper(your_llm_instance)
135135
```
136136

137-
For more information on how to use LlamaIndex, please refer to the [LlamaIndex Integration guide](../../howtos/integrations/_llamaindex/).
137+
For more information on how to use LlamaIndex, please refer to the [LlamaIndex Integration guide](./../../howtos/integrations/_llamaindex.md).
138138

139139
If your still not able use Ragas with your favorite LLM provider, please let us know by by commenting on this [issue](https://github.com/explodinggradients/ragas/issues/1617) and we'll add support for it 🙂.

docs/getstarted/evals.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ The purpose of this guide is to illustrate a simple workflow for testing and eva
77

88
In this guide, you will evaluate a **text summarization pipeline**. The goal is to ensure that the output summary accurately captures all the key details specified in the text, such as growth figures, market insights, and other essential information.
99

10-
`ragas` offers a variety of methods for analyzing the performance of LLM applications, referred to as [metrics](../concepts/metrics/). Each metric requires a predefined set of data points, which it uses to calculate scores that indicate performance.
10+
`ragas` offers a variety of methods for analyzing the performance of LLM applications, referred to as [metrics](../concepts/metrics/available_metrics/index.md). Each metric requires a predefined set of data points, which it uses to calculate scores that indicate performance.
1111

1212
### Evaluating using a Non-LLM Metric
1313

@@ -203,7 +203,7 @@ To fix these results, ragas provides a way to align the metric with your prefere
203203
2. **Download**: Save the annotated data using the `Annotated JSON` button in [app.ragas.io](https://app.ragas.io/).
204204
3. **Train**: Use the annotated data to train your custom metric.
205205

206-
To learn more about this, refer to how to [train your own metric guide](../howtos/customizations/metrics/train_your_own_metric.md)
206+
To learn more about this, refer to how to [train your own metric guide](./../howtos/customizations/metrics/train_your_own_metric.md)
207207

208208
[Download sample annotated JSON](../_static/sample_annotated_summary.json)
209209

docs/getstarted/rag_eval.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -157,7 +157,7 @@ evaluation_dataset = EvaluationDataset.from_list(dataset)
157157

158158
## Evaluate
159159

160-
We have successfully collected the evaluation data. Now, we can evaluate our RAG system on the collected dataset using a set of commonly used RAG evaluation metrics. You may choose any model as [evaluator LLM](/docs/howtos/customizations/customize_models.md) for evaluation.
160+
We have successfully collected the evaluation data. Now, we can evaluate our RAG system on the collected dataset using a set of commonly used RAG evaluation metrics. You may choose any model as [evaluator LLM](./../howtos/customizations/customize_models.md) for evaluation.
161161

162162
```python
163163
from ragas import evaluate

docs/getstarted/rag_testset_generation.md

+13-6
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,7 @@ docs = loader.load()
3131

3232
### Choose your LLM
3333

34-
You may choose to use any [LLM of your choice](../howtos/customizations/customize_models.md)
34+
You may choose to use any [LLM of your choice](./../howtos/customizations/customize_models.md)
3535
--8<--
3636
choose_generator_llm.md
3737
--8<--
@@ -55,9 +55,10 @@ Once you have generated a testset, you would want to view it and select the quer
5555
dataset.to_pandas()
5656
```
5757

58+
Output
5859
![testset](./testset_output.png)
5960

60-
You can also use other tools like [app.ragas.io](https://app.ragas.io/) or any other similar tools available for you in the [Integrations](../howtos/integrations/index.md) section.
61+
You can also use other tools like [app.ragas.io](https://app.ragas.io/) or any other similar tools available for you in the [Integrations](./../howtos/integrations/index.md) section.
6162

6263
In order to use the [app.ragas.io](https://app.ragas.io/) dashboard, you need to have an account on [app.ragas.io](https://app.ragas.io/). If you don't have one, you can sign up for one [here](https://app.ragas.io/login). You will also need to have a [Ragas APP token](https://app.ragas.io/settings/api-keys).
6364

@@ -93,6 +94,7 @@ from ragas.testset.graph import KnowledgeGraph
9394

9495
kg = KnowledgeGraph()
9596
```
97+
Output
9698
```
9799
KnowledgeGraph(nodes: 0, relationships: 0)
98100
```
@@ -110,6 +112,7 @@ for doc in docs:
110112
)
111113
)
112114
```
115+
Output
113116
```
114117
KnowledgeGraph(nodes: 10, relationships: 0)
115118
```
@@ -137,6 +140,8 @@ kg.save("knowledge_graph.json")
137140
loaded_kg = KnowledgeGraph.load("knowledge_graph.json")
138141
loaded_kg
139142
```
143+
144+
Output
140145
```
141146
KnowledgeGraph(nodes: 48, relationships: 605)
142147
```
@@ -158,11 +163,13 @@ from ragas.testset.synthesizers import default_query_distribution
158163

159164
query_distribution = default_query_distribution(generator_llm)
160165
```
166+
167+
Output
161168
```
162169
[
163-
(SingleHopSpecificQuerySynthesizer(llm=llm), 0.5),
164-
(MultiHopAbstractQuerySynthesizer(llm=llm), 0.25),
165-
(MultiHopSpecificQuerySynthesizer(llm=llm), 0.25),
170+
(SingleHopSpecificQuerySynthesizer(llm=llm), 0.5),
171+
(MultiHopAbstractQuerySynthesizer(llm=llm), 0.25),
172+
(MultiHopSpecificQuerySynthesizer(llm=llm), 0.25),
166173
]
167174
```
168175

@@ -172,5 +179,5 @@ Now we can generate the testset.
172179
testset = generator.generate(testset_size=10, query_distribution=query_distribution)
173180
testset.to_pandas()
174181
```
175-
182+
Output
176183
![testset](./testset_output.png)

docs/howtos/applications/_cost.md

+27-33
Original file line numberDiff line numberDiff line change
@@ -24,12 +24,10 @@ from ragas.cost import get_token_usage_for_openai
2424

2525
get_token_usage_for_openai(llm_result)
2626
```
27-
28-
29-
30-
31-
TokenUsage(input_tokens=9, output_tokens=9, model='')
32-
27+
Output
28+
```
29+
TokenUsage(input_tokens=9, output_tokens=9, model='')
30+
```
3331

3432

3533
You can define your own or import parsers if they are defined. If you would like to suggest parser for LLM providers or contribute your own ones please check out this [issue](https://github.com/explodinggradients/ragas/issues/1151) 🙂.
@@ -47,9 +45,10 @@ dataset = load_dataset("explodinggradients/amnesty_qa", "english_v3")
4745

4846
eval_dataset = EvaluationDataset.from_hf_dataset(dataset["eval"])
4947
```
50-
51-
Repo card metadata block was not found. Setting CardData to empty.
52-
48+
Output
49+
```
50+
Repo card metadata block was not found. Setting CardData to empty.
51+
```
5352

5453
You can pass in the parser to the `evaluate()` function and the cost will be calculated and returned in the `Result` object.
5554

@@ -67,21 +66,19 @@ result = evaluate(
6766
token_usage_parser=get_token_usage_for_openai,
6867
)
6968
```
70-
71-
72-
Evaluating: 0%| | 0/20 [00:00<?, ?it/s]
73-
69+
Output
70+
```
71+
Evaluating: 0%| | 0/20 [00:00<?, ?it/s]
72+
```
7473

7574

7675
```python
7776
result.total_tokens()
7877
```
79-
80-
81-
82-
83-
TokenUsage(input_tokens=25097, output_tokens=3757, model='')
84-
78+
Output
79+
```
80+
TokenUsage(input_tokens=25097, output_tokens=3757, model='')
81+
```
8582

8683

8784
You can compute the cost for each run by passing in the cost per token to `Result.total_cost()` function.
@@ -93,11 +90,10 @@ In this case GPT-4o costs $5 for 1M input tokens and $15 for 1M output tokens.
9390
result.total_cost(cost_per_input_token=5 / 1e6, cost_per_output_token=15 / 1e6)
9491
```
9592

96-
97-
98-
99-
1.1692900000000002
100-
93+
Output
94+
```
95+
1.1692900000000002
96+
```
10197

10298

10399
## Token Usage for Testset Generation
@@ -116,10 +112,9 @@ kg = KnowledgeGraph.load("../../../experiments/scratchpad_kg.json")
116112
kg
117113
```
118114

119-
120-
121-
122-
KnowledgeGraph(nodes: 47, relationships: 109)
115+
Output
116+
```
117+
KnowledgeGraph(nodes: 47, relationships: 109)
123118
124119
125120
@@ -145,9 +140,8 @@ testset = tg.generate(testset_size=10, token_usage_parser=get_token_usage_for_op
145140
testset.total_cost(cost_per_input_token=5 / 1e6, cost_per_output_token=15 / 1e6)
146141
```
147142

148-
149-
150-
151-
0.20967000000000002
152-
143+
Output
144+
```
145+
0.20967000000000002
146+
```
153147

0 commit comments

Comments
 (0)