Docs improvements (#186)

shahules786 · web-flow · commit 6787a5cf9aa3 · 2023-10-15T17:21:24.000+05:30
- Core concepts
- Azure Ai notebook
diff --git a/docs/concepts/index.md b/docs/concepts/index.md
@@ -10,31 +10,29 @@ testset_generation
 feedback
 :::
 
-Ragas aims to create an open standard, providing devs with the tools and techniques to leverage continual learning in their RAG applications. With Ragas, you would be able to
+Ragas aims to create an open standard, providing developers with the tools and techniques to leverage continual learning in their RAG applications. With Ragas, you would be able to
 
 1. Synthetically generate a diverse test dataset that you can use to evaluate your app.
-2. use advanced metrics we built to measure how your app performs.
-3. help monitor your apps in production with custom models and see if there are any discrepancies.
-4. bring up those discrepancies and build new datasets so that you can test and refine your app further to solve them.
+2. Use LLM-assisted evaluation metrics designed to help you objectively measure the performance of your application.
+3. Monitor the quality of your apps in production using smaller, cheaper models that can give actionable insights. For example, the number of hallucinations in the generated answer. 
+4. Use these insights to iterate and improve your application.
+
 
 (what-is-rag)=
 :::{dropdown} what is RAG and continual learning?
 ```{rubric} RAG
 ```
 
-Retrieval Augmented Generation (RAG) is a natural language processing (NLP) technique that combines the strengths of retrieval- and generative-based artificial intelligence (AI) models. 
- RAG uses an information retrieval system to provide data to a Large Language Model (LLM). 
- RAG models first use a retriever to identify a set of relevant documents from a knowledge base. 
- RAG can provide more accurate results to queries than a generative LLM on its own because RAG uses knowledge external to data already contained in the LLM.
+Retrieval augmented generation (RAG) is a paradigm for augmenting LLM with custom data. It generally consists of two stages:
+
+- indexing stage: preparing a knowledge base, and
+
+- querying stage: retrieving relevant context from the knowledge to assist the LLM in responding to a question
 
 ```{rubric} Continual Learning
 ```
 
-With continual learning, models continuously learn and evolve based on the input of increasing amounts of data while retaining previously-learned knowledge. The goal is to develop autonomous agents that can learn continuously and adaptively to develop skills necessary for performing more complex tasks without forgetting what has been learned before. 
-
-The goal of continual learning is to: 
-- Use data that is coming in to automatically retrain the model
-- Gain high accuracy and retain high performing models
+Continual learning is concept used in machine learning that aims to learn, iterate and improve ML pipelines over it's lifetime using the insights derived from continuous stream of data points.  In LLM & RAGs, this can be applied by iterating and improving each components of LLM application from insights derived from production and feedback data.
 :::
 
 ::::{grid} 2
diff --git a/docs/howtos/customisations/llms.ipynb b/docs/howtos/customisations/llms.ipynb
@@ -22,7 +22,7 @@
    "id": "55f0f9b9",
    "metadata": {},
    "source": [
-    "## Evaluating with GPT4\n",
+    "### Evaluating with GPT4\n",
     "\n",
     "Ragas uses gpt3.5 by default but using gpt4 for evaluation can improve the results so lets use that for the `Faithfulness` metric\n",
     "\n",
diff --git a/docs/howtos/customisations/quickstart-azure-openai.ipynb b/docs/howtos/customisations/quickstart-azure-openai.ipynb
@@ -1,5 +1,13 @@
 {
  "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "7c249b40",
+   "metadata": {},
+   "source": [
+    "# Using Azure OpenAI Endpoints\n"
+   ]
+  },
   {
    "cell_type": "markdown",
    "id": "2e63f667",
@@ -12,47 +20,18 @@
     "             src=\"https://colab.research.google.com/assets/colab-badge.svg\">\n",
     "    </a>\n",
     "    <br>\n",
-    "    <h1> Quickstart </h1>\n",
     "</p>\n",
     "\n",
-    "welcome to the ragas quickstart. We're going to get you up and running with ragas as qickly as you can so that you can go back to improving your Retrieval Augmented Generation pipelines while this library makes sure your changes are improving your entire pipeline.\n",
-    "\n",
-    "to kick things of lets start with the data\n",
     "\n",
-    "> **Note:** this guide is for folks who are using the Azure OpenAI endpoints. Check the [quickstart guide](../quickstart.ipynb) if your using OpenAI endpoints."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 1,
-   "id": "57585b55",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# if using colab uncomment this\n",
-    "#!pip install ragas"
+    "> **Note:** this guide is for folks who are using the Azure OpenAI endpoints. Check the [quickstart guide](../../getstarted/evaluation.md) if your using OpenAI endpoints."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "06c9fc7d",
+   "id": "e54b5e01",
    "metadata": {},
    "source": [
-    "## The Data\n",
-    "\n",
-    "Ragas performs a `ground_truth` free evaluation of your RAG pipelines. This is because for most people building a gold labeled dataset which represents in the distribution they get in production is a very expensive process.\n",
-    "\n",
-    "**Note:** *While originially ragas was aimed at `ground_truth` free evalutions there is some aspects of the RAG pipeline that need `ground_truth` in order to measure. We're in the process of building a testset generation features that will make it easier. Checkout [issue#136](https://github.com/explodinggradients/ragas/issues/136) for more details.*\n",
-    "\n",
-    "Hence to work with ragas all you need are the following data\n",
-    "- question: `list[str]` - These are the questions you RAG pipeline will be evaluated on. \n",
-    "- answer: `list[str]` - The answer generated from the RAG pipeline and give to the user.\n",
-    "- contexts: `list[list[str]]` - The contexts which where passed into the LLM to answer the question.\n",
-    "- ground_truths: `list[list[str]]` - The ground truth answer to the questions. (only required if you are using context_recall)\n",
-    "\n",
-    "Ideally your list of questions should reflect the questions your users give, including those that you have been problamatic in the past.\n",
-    "\n",
-    "Here we're using an example dataset from on of the baselines we created for the [Financial Opinion Mining and Question Answering (fiqa) Dataset](https://sites.google.com/view/fiqa/) we created. If you want to want to know more about the baseline, feel free to check the `experiements/baseline` section"
+    "### Load sample dataset"
    ]
   },
   {
@@ -106,42 +85,6 @@
     "fiqa_eval"
    ]
   },
-  {
-   "cell_type": "markdown",
-   "id": "84aa640f",
-   "metadata": {},
-   "source": [
-    "## Metrics\n",
-    "\n",
-    "Ragas provides you with a few metrics to evaluate the different aspects of your RAG systems namely\n",
-    "\n",
-    "1. metrics to evaluate retrieval: offers `context_precision` and `context_recall` which give you the measure of the performance of your retrieval system. \n",
-    "2. metrics to evaluate generation: offers `faithfulness` which measures hallucinations and `answer_relevancy` which measures how to-the-point the answers are to the question.\n",
-    "\n",
-    "The harmonic mean of these 4 aspects gives you the **ragas score** which is a single measure of the performance of your QA system across all the important aspects.\n",
-    "\n",
-    "![image](https://github.com/emilesilvis/ragas/assets/557338/b6c0db98-a0a9-4414-9ad3-372d8ceab4c7)\n",
-    "\n",
-    "Lets learn a bit more about the metrics available\n",
-    "\n",
-    "1. **Faithfulness**: measures the information consistency of the generated answer against the given context. If any claims are made in the answer that cannot be deduced from context is penalized. It is calculated from `answer` and `retrieved context`.\n",
-    "\n",
-    "2. **Context Precision**: measures how relevant retrieved contexts are to the question. Ideally, the context should only contain information necessary to answer the question. The presence of redundant information in the context is penalized. It is calculated from `question` and `retrieved context`.\n",
-    "\n",
-    "3. **Context Recall**: measures the recall of the retrieved context using annotated answer as ground truth. Annotated answer is taken as proxy for ground truth context. It is calculated from `ground truth` and `retrieved context`.\n",
-    "\n",
-    "4. **Answer Relevancy**: refers to the degree to which a response directly addresses and is appropriate for a given question or context. This does not take the factuality of the answer into consideration but rather penalizes the present of redundant information or incomplete answers given a question. It is calculated from `question` and `answer`.\n",
-    "\n",
-    "5. **Aspect Critiques**: Designed to judge the submission against defined aspects like harmlessness, correctness, etc. You can also define your own aspect and validate the submission against your desired aspect. The output of aspect critiques is always binary. It is calculated from `answer`.\n",
-    "\n",
-    "The final `ragas_score` is the harmonic mean of individual metric scores.\n",
-    "\n",
-    "\n",
-    "> **Note:** by default these metrics are using OpenAI's API to compute the score. If you using this metric make sure you set the environment key `OPENAI_API_KEY` with your API key. You can also try other LLMs for evaluation, check the [llm guide](./guides/llms.ipynb) to learn more\n",
-    "\n",
-    "If you're interested in learning more about the metrics, feel free to check the [metrics docs](https://github.com/explodinggradients/ragas/blob/main/docs/metrics.md)"
-   ]
-  },
   {
    "cell_type": "markdown",
    "id": "c77789bb",
@@ -262,7 +205,7 @@
    "id": "8d6ecd5a",
    "metadata": {},
    "source": [
-    "## Evaluation\n",
+    "### Evaluation\n",
     "\n",
     "Running the evalutation is as simple as calling evaluate on the `Dataset` with the metrics of your choice."
    ]