docs: redo readme (#1480)

shahules786 · web-flow · commit 028d76d560a9 · 2024-10-13T07:09:11.000+05:30
diff --git a/README.md b/README.md
@@ -3,7 +3,7 @@
   src="./docs/_static/imgs/logo.png">
 </h1>
 <p align="center">
-  <i>Evaluation framework for your Retrieval Augmented Generation (RAG) pipelines</i>
+  <i>Evaluation library for your LLM applications</i>
 </p>
 
 <p align="center">
@@ -16,33 +16,31 @@
     <a href="https://github.com/explodinggradients/ragas/blob/master/LICENSE">
         <img alt="License" src="https://img.shields.io/github/license/explodinggradients/ragas.svg?color=green">
     </a>
-    <a href="https://colab.research.google.com/github/explodinggradients/ragas/blob/main/docs/quickstart.ipynb">
-        <img alt="Open In Colab" src="https://colab.research.google.com/assets/colab-badge.svg">
+    <a href="https://pypi.org/project/ragas/">
+        <img alt="Open In Colab" src="https://img.shields.io/pypi/dm/ragas">
     </a>
     <a href="https://discord.gg/5djav8GGNZ">
         <img alt="discord-invite" src="https://dcbadge.vercel.app/api/server/5djav8GGNZ?style=flat">
     </a>
-    <a href="https://github.com/explodinggradients/ragas/">
-        <img alt="Downloads" src="https://badges.frapsoft.com/os/v1/open-source.svg?v=103">
-    </a>
 </p>
 
 <h4 align="center">
     <p>
         <a href="https://docs.ragas.io/">Documentation</a> |
-        <a href="#shield-installation">Installation</a> |
-        <a href="#fire-quickstart">Quickstart</a> |
-        <a href="#-community">Community</a> |
-        <a href="#-open-analytics">Open Analytics</a> |
-        <a href="https://huggingface.co/explodinggradients">Hugging Face</a>
+        <a href="#Quickstart">Quick start</a> |
+        <a href="https://dcbadge.vercel.app/api/server/5djav8GGNZ?style=flat">Join Discord</a> |
     <p>
 </h4>
 
-> 🚀 Dedicated solutions to evaluate, monitor and improve performance of LLM & RAG application in production including custom models for production quality monitoring.[Talk to founders](https://cal.com/shahul-ragas/30min)
+[Ragas](https://www.ragas.io/) supercharges your LLM application evaluations with tools to objectively measure performance, synthesize test case scenarios, and gain insights by leveraging production data.
+
+Evaluating and testing LLM applications is a challenging, time-consuming, and often boring process. Ragas aims provide a suite of tools that could supercharge your evaluation workflows and make it more efficient and fun using  state-of-the-art research. We are also building an open ecosystem, that fosters sharing of ideas to make the evaluation process better and collaborates with other tools in the market to make it seamless you.
 
-Ragas is a framework that helps you evaluate your Retrieval Augmented Generation (RAG) pipelines. RAG denotes a class of LLM applications that use external data to augment the LLM’s context. There are existing tools and frameworks that help you build these pipelines but evaluating it and quantifying your pipeline performance can be hard. This is where Ragas (RAG Assessment) comes in.
+## Key Features
 
-Ragas provides you with the tools based on the latest research for evaluating LLM-generated text to give you insights about your RAG pipeline. Ragas can be integrated with your CI/CD to provide continuous checks to ensure performance.
+- **Metrics**: Different LLM based and non LLM based metrics to objectively evaluate your LLM applications such as RAG, Agents, etc.
+- **Test Data Generation**: Synthesize high-quality datasets covering wide variety of scenarios for comprehensive testing of your LLM applications.
+- **Integrations**: Seamless integration with all major LLM applications frameworks like langchain and observability tools.
 
 ## :shield: Installation
 
@@ -60,33 +58,9 @@ pip install git+https://github.com/explodinggradients/ragas
 
 ## :fire: Quickstart
 
-This is a small example program you can run to see ragas in action!
-
-```python
-
-from datasets import Dataset 
-import os
-from ragas import evaluate
-from ragas.metrics import faithfulness, answer_correctness
-
-os.environ["OPENAI_API_KEY"] = "your-openai-key"
-
-data_samples = {
-    'question': ['When was the first super bowl?', 'Who won the most super bowls?'],
-    'answer': ['The first superbowl was held on Jan 15, 1967', 'The most super bowls have been won by The New England Patriots'],
-    'contexts' : [['The First AFL–NFL World Championship Game was an American football game played on January 15, 1967, at the Los Angeles Memorial Coliseum in Los Angeles,'], 
-    ['The Green Bay Packers...Green Bay, Wisconsin.','The Packers compete...Football Conference']],
-    'ground_truth': ['The first superbowl was held on January 15, 1967', 'The New England Patriots have won the Super Bowl a record six times']
-}
-
-dataset = Dataset.from_dict(data_samples)
-
-score = evaluate(dataset,metrics=[faithfulness,answer_correctness])
-score.to_pandas()
-```
-
-Refer to our [documentation](https://docs.ragas.io/) to learn more.
 
+- [Run ragas metrics for evaluating RAG](https://docs.ragas.io/en/latest/getstarted/rag_evaluation/)
+- [Generate test data for evaluating RAG](https://docs.ragas.io/en/latest/getstarted/rag_testset_generation/)
 
 ## 🫂 Community
 
diff --git a/docs/concepts/metrics/overview/index.md b/docs/concepts/metrics/overview/index.md
@@ -10,11 +10,10 @@ A metric is a quantitative measure used to evaluate the performance of a AI appl
 ## Different types of metrics
 
 <figure markdown="span">
-  ![Metrics Mind map](../../../_static/imgs/metrics_mindmap.png){width="600"}
+  ![Component-wise Evaluation](../../../_static/imgs/metrics_mindmap.png){width="600"}
   <figcaption>Metrics Mind map</figcaption>
 </figure>
 
-
 **Metrics can be classified into two categories based on the mechanism used underneath the hood**:
 
 &nbsp;&nbsp;&nbsp;&nbsp; **LLM-based metrics**: These metrics use LLM underneath to do the evaluation. There might be one or more LLM calls that are performed to arrive at the score or result. These metrics can be somewhat non deterministic as the LLM might not always return the same result for the same input. On the other hand, these metrics has shown to be more accurate and closer to human evaluation.