Skip to content

doc: init docs.ragas.io #170

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
Oct 9, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 23 additions & 0 deletions .readthedocs.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
version: 2

# Set the OS, Python version and other tools you might need
build:
os: ubuntu-22.04
tools:
python: "3.11"
# You can also specify other tool versions:
# nodejs: "20"
# rust: "1.70"
# golang: "1.20"

# Build documentation in the "docs/" directory with Sphinx
sphinx:
configuration: ./docs/conf.py
# You can configure Sphinx to use a different builder, for instance use the dirhtml builder for simpler URLs
# builder: "dirhtml"
# Fail on all warnings to avoid broken references
# fail_on_warning: true

python:
install:
- requirements: ./requirements/docs.txt
5 changes: 5 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -34,3 +34,8 @@ test: ## Run tests
test-e2e: ## Run end2end tests
echo "running end2end tests..."
@pytest tests/e2e -s

# Docs
doc-site: ## Build and serve documentation
@sphinx-build -nW --keep-going -j 4 -b html $(GIT_ROOT)/docs/ $(GIT_ROOT)/docs/_build/html
@python -m http.server --directory $(GIT_ROOT)/docs/_build/html
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,9 +41,9 @@

> 🚀 Dedicated solutions and support to improve the reliability of RAG systems in production including custom models for production quality monitoring. Contact founders to learn more. [Talk to founders](https://calendly.com/shahules/30min)

ragas is a framework that helps you evaluate your Retrieval Augmented Generation (RAG) pipelines. RAG denotes a class of LLM applications that use external data to augment the LLM’s context. There are existing tools and frameworks that help you build these pipelines but evaluating it and quantifying your pipeline performance can be hard. This is where ragas (RAG Assessment) comes in.
Ragas is a framework that helps you evaluate your Retrieval Augmented Generation (RAG) pipelines. RAG denotes a class of LLM applications that use external data to augment the LLM’s context. There are existing tools and frameworks that help you build these pipelines but evaluating it and quantifying your pipeline performance can be hard. This is where Ragas (RAG Assessment) comes in.

ragas provides you with the tools based on the latest research for evaluating LLM-generated text to give you insights about your RAG pipeline. ragas can be integrated with your CI/CD to provide continuous checks to ensure performance.
Ragas provides you with the tools based on the latest research for evaluating LLM-generated text to give you insights about your RAG pipeline. Ragas can be integrated with your CI/CD to provide continuous checks to ensure performance.

## :shield: Installation

Expand Down
20 changes: 20 additions & 0 deletions docs/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# Minimal makefile for Sphinx documentation
#

# You can set these variables from the command line, and also
# from the environment for the first two.
SPHINXOPTS ?=
SPHINXBUILD ?= sphinx-build
SOURCEDIR = source
BUILDDIR = build

# Put it first so that "make" without argument is like "make help".
help:
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)

.PHONY: help Makefile

# Catch-all target: route all unknown targets to Sphinx using the new
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
%: Makefile
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
25 changes: 25 additions & 0 deletions docs/_static/css/ragas.css
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
/* Make pandas tables look correct in dark-mode */
div.cell_output table {
color: var(--color-content-foreground);
}

div.cell_output table {
margin: auto;
}

div.cell_output tbody tr:nth-child(odd):not(:hover) {
background: var(--color-table-header-background);
}

div.cell_output thead {
border-bottom-color: var(--color-code-foreground);
}

div.cell_input {
display: none;
}

.dark {
background: var(--color-content-background);
color: var(--color-content-foreground);
}
Binary file added docs/_static/favicon.ico
Binary file not shown.
File renamed without changes
File renamed without changes
File renamed without changes
Binary file added docs/_static/imgs/ragas-logo.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
6 changes: 3 additions & 3 deletions docs/Metrics.ipynb → docs/concepts/Metrics.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -41,9 +41,9 @@
],
"metadata": {
"kernelspec": {
"display_name": "ragas",
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "ragas"
"name": "python3"
},
"language_info": {
"codemirror_mode": {
Expand All @@ -55,7 +55,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.8"
"version": "3.10.12"
}
},
"nbformat": 4,
Expand Down
5 changes: 5 additions & 0 deletions docs/concepts/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# Core Concepts

```{toctree}
metrics.md
```
File renamed without changes.
45 changes: 45 additions & 0 deletions docs/conf.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
# Configuration file for the Sphinx documentation builder.
#
# For the full list of built-in configuration values, see the documentation:
# https://www.sphinx-doc.org/en/master/usage/configuration.html

# -- Project information -----------------------------------------------------
# https://www.sphinx-doc.org/en/master/usage/configuration.html#project-information

project = "ragas"
copyright = "2023, ExplodingGradients"
author = "ExplodingGradients"
release = "0.0.16"

# -- General configuration ---------------------------------------------------
# https://www.sphinx-doc.org/en/master/usage/configuration.html#general-configuration

extensions = [
"myst_parser",
"sphinx_design",
# "myst_parser",
"sphinxawesome_theme.highlighting",
# "sphinxawesome_theme.docsearch",
]
source_suffix = [".rst", ".md"]

templates_path = ["_templates"]
exclude_patterns = []


# -- Options for HTML output -------------------------------------------------
# https://www.sphinx-doc.org/en/master/usage/configuration.html#options-for-html-output

html_title = "Ragas"
html_theme = "sphinxawesome_theme"
html_static_path = ["_static"]
html_css_files = ["css/ragas.css"]
html_favicon = "./_static/favicon.ico"

html_theme_options = {
"logo_light": "./_static/imgs/ragas-logo.png",
"logo_dark": "./_static/imgs/ragas-logo.png",
}

# -- Myst NB Config -------------------------------------------------
nb_execution_mode = "auto"
118 changes: 118 additions & 0 deletions docs/getstarted/evaluation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,118 @@
---
file_format: mystnb
kernelspec:
name: python3
execution:
timeout: 300
---
# Evaluation

welcome to the ragas quickstart. We're going to get you up and running with ragas as qickly as you can so that you can go back to improving your Retrieval Augmented Generation pipelines while this library makes sure your changes are improving your entire pipeline.

to kick things of lets start with the data

```{note}
Are you using Azure OpenAI endpoints? Then checkout [this quickstart guide](./guides/quickstart-azure-openai.ipynb)
```

```bash
pip install ragas
```

Ragas also uses OpenAI for running some metrics so make sure you have your openai key ready and available in your environment
```python
import os

os.environ["OPENAI_API_KEY"] = "your-openai-key"
```
## The Data

Ragas performs a `ground_truth` free evaluation of your RAG pipelines. This is because for most people building a gold labeled dataset which represents in the distribution they get in production is a very expensive process.

```{note}
While originially ragas was aimed at `ground_truth` free evalutions there is some aspects of the RAG pipeline that need `ground_truth` in order to measure. We're in the process of building a testset generation features that will make it easier. Checkout [issue#136](https://github.com/explodinggradients/ragas/issues/136) for more details.
```

Hence to work with ragas all you need are the following data
- question: `list[str]` - These are the questions you RAG pipeline will be evaluated on.
- answer: `list[str]` - The answer generated from the RAG pipeline and give to the user.
- contexts: `list[list[str]]` - The contexts which where passed into the LLM to answer the question.
- ground_truths: `list[list[str]]` - The ground truth answer to the questions. (only required if you are using context_recall)

Ideally your list of questions should reflect the questions your users give, including those that you have been problamatic in the past.

Here we're using an example dataset from on of the baselines we created for the [Financial Opinion Mining and Question Answering (fiqa) Dataset](https://sites.google.com/view/fiqa/) we created. If you want to want to know more about the baseline, feel free to check the `experiements/baseline` section

```{code-cell} python
# data
from datasets import load_dataset

fiqa_eval = load_dataset("explodinggradients/fiqa", "ragas_eval")
fiqa_eval
```

## Metrics

Ragas provides you with a few metrics to evaluate the different aspects of your RAG systems namely

1. metrics to evaluate retrieval: offers `context_precision` and `context_recall` which give you the measure of the performance of your retrieval system.
2. metrics to evaluate generation: offers `faithfulness` which measures hallucinations and `answer_relevancy` which measures how to the point the answers are to the question.

The harmonic mean of these 4 aspects gives you the **ragas score** which is a single measure of the performance of your QA system across all the important aspects.

now lets import these metrics and understand more about what they denote

```{code-cell}
from ragas.metrics import (
answer_relevancy,
faithfulness,
context_recall,
context_precision,
)
from ragas.metrics.critique import harmfulness
```
here you can see that we are using 4 metrics, but what do the represent?

1. context_precision - a measure of how relevant the retrieved context is to the question. Conveys quality of the retrieval pipeline.
2. answer_relevancy - a measure of how relevent the answer is to the question
3. faithfulness - the factual consistancy of the answer to the context base on the question.
4. context_recall: measures the ability of the retriever to retrieve all the necessary information needed to answer the question.
5. harmfulness (AspectCritique) - in general, `AspectCritique` is a metric that can be used to quantify various aspects of the answer. Aspects like harmfulness, maliciousness, coherence, correctness, concisenes are available by default but you can easily define your own. Check the [docs](./metrics.md) for more info.

```{note}
by default these metrics are using OpenAI's API to compute the score. If you using this metric make sure you set the environment key `OPENAI_API_KEY` with your API key. You can also try other LLMs for evaluation, check the [llm guide](./guides/llms.ipynb) to learn more
```

If you're interested in learning more, feel free to check the [docs](https://github.com/explodinggradients/ragas/blob/main/docs/metrics.md)

## Evaluation

Running the evalutation is as simple as calling evaluate on the `Dataset` with the metrics of your choice.

```{code-cell}
from ragas import evaluate

result = evaluate(
fiqa_eval["baseline"].select(range(1)),
metrics=[
context_precision,
faithfulness,
answer_relevancy,
context_recall,
harmfulness,
],
)

result
```
and there you have the it, all the scores you need. `ragas_score` gives you a single metric that you can use while the other onces measure the different parts of your pipeline.

now if we want to dig into the results and figure out examples where your pipeline performed worse or really good you can easily convert it into a pandas array and use your standard analytics tools too!

```{code-cell}
df = result.to_pandas()
df.head()
```
And thats it!

if you have any suggestion/feedbacks/things your not happy about, please do share it in the [issue section](https://github.com/explodinggradients/ragas/issues). We love hearing from you 😁
7 changes: 7 additions & 0 deletions docs/getstarted/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# Get Started

```{toctree}
:maxdepth: 1
install.md
evaluation.md
```
20 changes: 20 additions & 0 deletions docs/getstarted/install.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# Install

You can install ragas with
```bash
pip install ragas
```

If you want to install the latest version (from the main branch)
```bash
pip install git+https://github.com/explodinggradients/ragas.git
```

If you are looking to contribute and make changes to the code, make sure you
clone the repo and install it as [editable
install](https://pip.pypa.io/en/stable/topics/local-project-installs/#editable-installs).
```bash
git clone https://github.com/explodinggradients/ragas.git
cd ragas
pip install -e .
```
File renamed without changes.
File renamed without changes.
45 changes: 0 additions & 45 deletions docs/guides/data_prep.py

This file was deleted.

File renamed without changes.
1 change: 1 addition & 0 deletions docs/howtos/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
# How-to Guides
13 changes: 13 additions & 0 deletions docs/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# Welcome

Ragas is a framework that helps you evaluate your Retrieval Augmented Generation (RAG) pipelines. RAG denotes a class of LLM applications that use external data to augment the LLM’s context. There are existing tools and frameworks that help you build these pipelines but evaluating it and quantifying your pipeline performance can be hard. This is where Ragas (RAG Assessment) comes in.

Ragas provides you with the tools based on the latest research for evaluating LLM-generated text to give you insights about your RAG pipeline. Ragas can be integrated with your CI/CD to provide continuous checks to ensure performance.

```{toctree}
:hidden:
getstarted/index.md
concepts/index.md
howtos/index.md
references/index.md
```
Loading