explodinggradients
diff --git a/‎docs/concepts/components/eval_dataset.md
+1-1 b/‎docs/concepts/components/eval_dataset.md
+1-1
diff --git a/‎docs/concepts/index.md
+4-3 b/‎docs/concepts/index.md
+4-3
diff --git a/‎docs/concepts/metrics/overview/index.md
+4-4 b/‎docs/concepts/metrics/overview/index.md
+4-4
diff --git a/‎docs/concepts/test_data_generation/rag.md
+2-1 b/‎docs/concepts/test_data_generation/rag.md
+2-1
diff --git a/‎docs/extra/components/choose_evaluator_llm.md
+2-2 b/‎docs/extra/components/choose_evaluator_llm.md
+2-2
diff --git a/‎docs/extra/components/choose_generator_llm.md
+2-2 b/‎docs/extra/components/choose_generator_llm.md
+2-2
diff --git a/‎docs/getstarted/evals.md
+2-2 b/‎docs/getstarted/evals.md
+2-2
diff --git a/‎docs/getstarted/rag_eval.md
+1-1 b/‎docs/getstarted/rag_eval.md
+1-1
diff --git a/‎docs/getstarted/rag_testset_generation.md
+13-6 b/‎docs/getstarted/rag_testset_generation.md
+13-6
diff --git a/‎docs/howtos/applications/_cost.md
+27-33 b/‎docs/howtos/applications/_cost.md
+27-33
@@ -20,7 +20,7 @@ An evaluation dataset consists of:
 
 - **Define Clear Objectives**: Identify the specific aspects of the AI application that you want to evaluate and the scenarios you want to test. Collect data samples that reflect these objectives.
 
-- **Collect Representative Data**: Ensure that the dataset covers a diverse range of scenarios, user inputs, and expected responses to provide a comprehensive evaluation of the AI application. This can be achieved by collecting data from various sources or [generating synthetic data]().
+- **Collect Representative Data**: Ensure that the dataset covers a diverse range of scenarios, user inputs, and expected responses to provide a comprehensive evaluation of the AI application. This can be achieved by collecting data from various sources or [generating synthetic data](./../../howtos/customizations/index.md#testset-generation).
 
 - **Quality and Size**: Aim for a dataset that is large enough to provide meaningful insights but not so large that it becomes unwieldy. Ensure that the data is of high quality and accurately reflects the real-world scenarios you want to evaluate.
 
 
@@ -9,23 +9,24 @@
 
     Discover the various components used within Ragas.
 
-    Components like [Prompt Object](components/index.md#prompt-object), [Evaluation Dataset](components/index.md#evaluation-dataset) and [more..](components/index.md)
+    Components like [Prompt Object](components/prompt.md), [Evaluation Dataset](components/eval_dataset.md) and [more..](components/index.md)
+
 
 -   ::material-ruler-square:{ .lg .middle } [__Ragas Metrics__](metrics/index.md)
 
     ---
 
     Explore available metrics and understand how they work.
 
-    Metrics for evaluating [RAG](metrics/index.md/#retrieval-augmented-generation), [Agentic workflows](metrics/index.md/#agents-or-tool-use-cases) and [more..](metrics/index.md/#list-of-available-metrics).
+    Metrics for evaluating [RAG](metrics/available_metrics/index.md#retrieval-augmented-generation), [Agentic workflows](metrics/available_metrics/index.md#agents-or-tool-use-cases) and [more..](metrics/available_metrics/index.md#list-of-available-metrics).
 
 -   :material-database-plus:{ .lg .middle } [__Test Data Generation__](test_data_generation/index.md)
 
     ---
 
     Generate high-quality datasets for comprehensive testing.
 
-    Algorithms for synthesizing data to test [RAG](test_data_generation/index.md#retrieval-augmented-generation), [Agentic workflows](test_data_generation/index.md#agents-or-tool-use-cases) 
+    Algorithms for synthesizing data to test [RAG](test_data_generation/rag.md), [Agentic workflows](test_data_generation/agents.md) 
 
 
 -   :material-chart-box-outline:{ .lg .middle } [__Feedback Intelligence__](feedback/index.md)
 
@@ -18,14 +18,14 @@ A metric is a quantitative measure used to evaluate the performance of a AI appl
 
 &nbsp;&nbsp;&nbsp;&nbsp; **LLM-based metrics**: These metrics use LLM underneath to do the evaluation. There might be one or more LLM calls that are performed to arrive at the score or result. These metrics can be somewhat non deterministic as the LLM might not always return the same result for the same input. On the other hand, these metrics has shown to be more accurate and closer to human evaluation.
 
-All LLM based metrics in ragas are inherited from `MetricWithLLM` class. These metrics expects a [LLM]() object to be set before scoring.
+All LLM based metrics in ragas are inherited from `MetricWithLLM` class. These metrics expects a LLM object to be set before scoring.
 
 ```python
 from ragas.metrics import FactualCorrectness
 scorer = FactualCorrectness(llm=evaluation_llm)
 ```
 
-Each LLM based metrics also will have prompts associated with it written using [Prompt Object]().
+Each LLM based metrics also will have prompts associated with it written using [Prompt Object](./../../components/prompt.md).
 
 
 &nbsp;&nbsp;&nbsp;&nbsp; **Non-LLM-based metrics**: These metrics do not use LLM underneath to do the evaluation. These metrics are deterministic and can be used to evaluate the performance of the AI application without using LLM. These metrics rely on traditional methods to evaluate the performance of the AI application, such as string similarity, BLEU score, etc. Due to the same, these metrics are known to have a lower correlation with human evaluation.
@@ -34,7 +34,7 @@ All LLM based metrics in ragas are inherited from `Metric` class.
 
 **Metrics can be broadly classified into two categories based on the type of data they evaluate**:
 
-&nbsp;&nbsp;&nbsp;&nbsp; **Single turn metrics**: These metrics evaluate the performance of the AI application based on a single turn of interaction between the user and the AI. All metrics in ragas that supports single turn evaluation are inherited from `SingleTurnMetric` class and scored using `single_turn_ascore` method. It also expects a [Single Turn Sample]() object as input.
+&nbsp;&nbsp;&nbsp;&nbsp; **Single turn metrics**: These metrics evaluate the performance of the AI application based on a single turn of interaction between the user and the AI. All metrics in ragas that supports single turn evaluation are inherited from [SingleTurnMetric][ragas.metrics.base.SingleTurnMetric] class and scored using `single_turn_ascore` method. It also expects a [Single Turn Sample][ragas.dataset_schema.SingleTurnSample] object as input.
 
 ```python
 from ragas.metrics import FactualCorrectness
@@ -43,7 +43,7 @@ scorer = FactualCorrectness()
 await scorer.single_turn_ascore(sample)
 ```
 
-&nbsp;&nbsp;&nbsp;&nbsp; **Multi-turn metrics**: These metrics evaluate the performance of the AI application based on multiple turns of interaction between the user and the AI. All metrics in ragas that supports multi turn evaluation are inherited from `MultiTurnMetric` class and scored using `multi_turn_ascore` method. It also expects a [Multi Turn Sample]() object as input.
+&nbsp;&nbsp;&nbsp;&nbsp; **Multi-turn metrics**: These metrics evaluate the performance of the AI application based on multiple turns of interaction between the user and the AI. All metrics in ragas that supports multi turn evaluation are inherited from [MultiTurnMetric][ragas.metrics.base.MultiTurnMetric] class and scored using `multi_turn_ascore` method. It also expects a [Multi Turn Sample][ragas.dataset_schema.MultiTurnSample] object as input.
 
 ```python
 from ragas.metrics import AgentGoalAccuracy
 
@@ -103,7 +103,7 @@ graph TD
 
 ### Extractors
 
-Different extractors are used to extract information from each nodes that can be used to establish the relationship between the nodes. For example, in the case of financial documents, the extractor that can be used are entity extractor to extract the entities like Company Name, Keyphrase extractor to extract important key phrases present in each node, etc. You can write your own [custom extractors]() to extract the information that is relevant to your domain.
+Different extractors are used to extract information from each nodes that can be used to establish the relationship between the nodes. For example, in the case of financial documents, the extractor that can be used are entity extractor to extract the entities like Company Name, Keyphrase extractor to extract important key phrases present in each node, etc. You can write your own custom extractors to extract the information that is relevant to your domain.
 
 Extractors can be LLM based which are inherited from `LLMBasedExtractor` or rule based which are inherited from `Extractor`.
 
@@ -165,6 +165,7 @@ graph TD
 
 The extracted information is used to establish the relationship between the nodes. For example, in the case of financial documents, the relationship can be established between the nodes based on the entities present in the nodes.
 You can write your own [custom relationship builder]() to establish the relationship between the nodes based on the information that is relevant to your domain.
+# Link missing above
 
 #### Example 
 
 
@@ -126,7 +126,7 @@
     evaluator_llm = LangchainLLMWrapper(your_llm_instance)
     ```
 
-    For a more detailed guide, checkout [the guide on customizing models](../../howtos/customizations/customize_models/).
+    For a more detailed guide, checkout [the guide on customizing models](../../howtos/customizations/customize_models.md).
 
     If you using LlamaIndex, you can use the `LlamaIndexLLMWrapper` to wrap your LLM so that it can be used with ragas.
 
@@ -135,6 +135,6 @@
     evaluator_llm = LlamaIndexLLMWrapper(your_llm_instance)
     ```
 
-    For more information on how to use LlamaIndex, please refer to the [LlamaIndex Integration guide](../../howtos/integrations/_llamaindex/).
+    For more information on how to use LlamaIndex, please refer to the [LlamaIndex Integration guide](./../../howtos/integrations/_llamaindex.md).
 
     If your still not able use Ragas with your favorite LLM provider, please let us know by by commenting on this [issue](https://github.com/explodinggradients/ragas/issues/1617) and we'll add support for it 🙂.
@@ -125,7 +125,7 @@
     generator_llm = LangchainLLMWrapper(your_llm_instance)
     ```
 
-    For a more detailed guide, checkout [the guide on customizing models](../../howtos/customizations/customize_models/).
+    For a more detailed guide, checkout [the guide on customizing models](../../howtos/customizations/customize_models.md).
 
     If you using LlamaIndex, you can use the `LlamaIndexLLMWrapper` to wrap your LLM so that it can be used with ragas.
 
@@ -134,6 +134,6 @@
     generator_llm = LlamaIndexLLMWrapper(your_llm_instance)
     ```
 
-    For more information on how to use LlamaIndex, please refer to the [LlamaIndex Integration guide](../../howtos/integrations/_llamaindex/).
+    For more information on how to use LlamaIndex, please refer to the [LlamaIndex Integration guide](./../../howtos/integrations/_llamaindex.md).
 
     If your still not able use Ragas with your favorite LLM provider, please let us know by by commenting on this [issue](https://github.com/explodinggradients/ragas/issues/1617) and we'll add support for it 🙂.
@@ -7,7 +7,7 @@ The purpose of this guide is to illustrate a simple workflow for testing and eva
 
 In this guide, you will evaluate a **text summarization pipeline**. The goal is to ensure that the output summary accurately captures all the key details specified in the text, such as growth figures, market insights, and other essential information.
 
-`ragas` offers a variety of methods for analyzing the performance of LLM applications, referred to as [metrics](../concepts/metrics/). Each metric requires a predefined set of data points, which it uses to calculate scores that indicate performance.
+`ragas` offers a variety of methods for analyzing the performance of LLM applications, referred to as [metrics](../concepts/metrics/available_metrics/index.md). Each metric requires a predefined set of data points, which it uses to calculate scores that indicate performance.
 
 ### Evaluating using a Non-LLM Metric
 
@@ -203,7 +203,7 @@ To fix these results, ragas provides a way to align the metric with your prefere
 2. **Download**: Save the annotated data using the `Annotated JSON` button in [app.ragas.io](https://app.ragas.io/).
 3. **Train**: Use the annotated data to train your custom metric.
 
-To learn more about this, refer to how to [train your own metric guide](../howtos/customizations/metrics/train_your_own_metric.md)
+To learn more about this, refer to how to [train your own metric guide](./../howtos/customizations/metrics/train_your_own_metric.md)
 
 [Download sample annotated JSON](../_static/sample_annotated_summary.json)
 
 
@@ -157,7 +157,7 @@ evaluation_dataset = EvaluationDataset.from_list(dataset)
 
 ## Evaluate
 
-We have successfully collected the evaluation data. Now, we can evaluate our RAG system on the collected dataset using a set of commonly used RAG evaluation metrics. You may choose any model as [evaluator LLM](/docs/howtos/customizations/customize_models.md) for evaluation. 
+We have successfully collected the evaluation data. Now, we can evaluate our RAG system on the collected dataset using a set of commonly used RAG evaluation metrics. You may choose any model as [evaluator LLM](./../howtos/customizations/customize_models.md) for evaluation. 
 
 ```python
 from ragas import evaluate
 
@@ -31,7 +31,7 @@ docs = loader.load()
 
 ### Choose your LLM
 
-You may choose to use any [LLM of your choice](../howtos/customizations/customize_models.md)
+You may choose to use any [LLM of your choice](./../howtos/customizations/customize_models.md)
 --8<--
 choose_generator_llm.md
 --8<--
@@ -55,9 +55,10 @@ Once you have generated a testset, you would want to view it and select the quer
 dataset.to_pandas()
 ```
 
+Output
 ![testset](./testset_output.png)
 
-You can also use other tools like [app.ragas.io](https://app.ragas.io/) or any other similar tools available for you in the [Integrations](../howtos/integrations/index.md) section.
+You can also use other tools like [app.ragas.io](https://app.ragas.io/) or any other similar tools available for you in the [Integrations](./../howtos/integrations/index.md) section.
 
 In order to use the [app.ragas.io](https://app.ragas.io/) dashboard, you need to have an account on [app.ragas.io](https://app.ragas.io/). If you don't have one, you can sign up for one [here](https://app.ragas.io/login). You will also need to have a [Ragas APP token](https://app.ragas.io/settings/api-keys).
 
@@ -93,6 +94,7 @@ from ragas.testset.graph import KnowledgeGraph
 
 kg = KnowledgeGraph()
 ```
+Output
 ```
 KnowledgeGraph(nodes: 0, relationships: 0)
 ```
@@ -110,6 +112,7 @@ for doc in docs:
         )
     )
 ```
+Output
 ```
 KnowledgeGraph(nodes: 10, relationships: 0)
 ```
@@ -137,6 +140,8 @@ kg.save("knowledge_graph.json")
 loaded_kg = KnowledgeGraph.load("knowledge_graph.json")
 loaded_kg
 ```
+
+Output
 ```
 KnowledgeGraph(nodes: 48, relationships: 605)
 ```
@@ -158,11 +163,13 @@ from ragas.testset.synthesizers import default_query_distribution
 
 query_distribution = default_query_distribution(generator_llm)
 ```
+
+Output
 ```
 [
-        (SingleHopSpecificQuerySynthesizer(llm=llm), 0.5),
-        (MultiHopAbstractQuerySynthesizer(llm=llm), 0.25),
-        (MultiHopSpecificQuerySynthesizer(llm=llm), 0.25),
+    (SingleHopSpecificQuerySynthesizer(llm=llm), 0.5),
+    (MultiHopAbstractQuerySynthesizer(llm=llm), 0.25),
+    (MultiHopSpecificQuerySynthesizer(llm=llm), 0.25),
 ]
 ```
 
@@ -172,5 +179,5 @@ Now we can generate the testset.
 testset = generator.generate(testset_size=10, query_distribution=query_distribution)
 testset.to_pandas()
 ```
-
+Output
 ![testset](./testset_output.png)
@@ -24,12 +24,10 @@ from ragas.cost import get_token_usage_for_openai
 
 get_token_usage_for_openai(llm_result)
 ```
-
-
-
-
-    TokenUsage(input_tokens=9, output_tokens=9, model='')
-
+Output
+```
+TokenUsage(input_tokens=9, output_tokens=9, model='')
+```
 
 
 You can define your own or import parsers if they are defined. If you would like to suggest parser for LLM providers or contribute your own ones please check out this [issue](https://github.com/explodinggradients/ragas/issues/1151) 🙂.
@@ -47,9 +45,10 @@ dataset = load_dataset("explodinggradients/amnesty_qa", "english_v3")
 
 eval_dataset = EvaluationDataset.from_hf_dataset(dataset["eval"])
 ```
-
-    Repo card metadata block was not found. Setting CardData to empty.
-
+Output
+```
+Repo card metadata block was not found. Setting CardData to empty.
+```
 
 You can pass in the parser to the `evaluate()` function and the cost will be calculated and returned in the `Result` object.
 
@@ -67,21 +66,19 @@ result = evaluate(
     token_usage_parser=get_token_usage_for_openai,
 )
 ```
-
-
-    Evaluating:   0%|          | 0/20 [00:00<?, ?it/s]
-
+Output
+```
+Evaluating:   0%|          | 0/20 [00:00<?, ?it/s]
+```
 
 
 ```python
 result.total_tokens()
 ```
-
-
-
-
-    TokenUsage(input_tokens=25097, output_tokens=3757, model='')
-
+Output
+```
+TokenUsage(input_tokens=25097, output_tokens=3757, model='')
+```
 
 
 You can compute the cost for each run by passing in the cost per token to `Result.total_cost()` function.
@@ -93,11 +90,10 @@ In this case GPT-4o costs $5 for 1M input tokens and $15 for 1M output tokens.
 result.total_cost(cost_per_input_token=5 / 1e6, cost_per_output_token=15 / 1e6)
 ```
 
-
-
-
-    1.1692900000000002
-
+Output
+```
+1.1692900000000002
+```
 
 
 ## Token Usage for Testset Generation
@@ -116,10 +112,9 @@ kg = KnowledgeGraph.load("../../../experiments/scratchpad_kg.json")
 kg
 ```
 
-
-
-
-    KnowledgeGraph(nodes: 47, relationships: 109)
+Output
+```
+KnowledgeGraph(nodes: 47, relationships: 109)
 
 
 
@@ -145,9 +140,8 @@ testset = tg.generate(testset_size=10, token_usage_parser=get_token_usage_for_op
 testset.total_cost(cost_per_input_token=5 / 1e6, cost_per_output_token=15 / 1e6)
 ```
 
-
-
-
-    0.20967000000000002
-
+Output
+```
+0.20967000000000002
+```