Skip to content

Commit 7a31b55

Browse files
authored
docs: add new rag eval tutorial (#1815)
1 parent e91a672 commit 7a31b55

File tree

5 files changed

+208
-7
lines changed

5 files changed

+208
-7
lines changed

β€Ždocs/getstarted/evals.md

+3-4
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
1-
# Evaluating your first AI app
1+
# Evaluate a simple LLM application
22

3-
The purpose of this guide is to illustrate a simple workflow for testing and evaluating an LLM application with `ragas`. It assumed minimum knowledge in AI application building and evaluation. Please refer to our [installation instruction](./install.md) for installing `ragas`
3+
The purpose of this guide is to illustrate a simple workflow for testing and evaluating an LLM application with `ragas`. It assumes minimum knowledge in AI application building and evaluation. Please refer to our [installation instruction](./install.md) for installing `ragas`
44

55

66
## Evaluation
@@ -220,5 +220,4 @@ Once trained, you can re-evaluate the same or different test datasets. You shoul
220220

221221
## Up Next
222222

223-
- [Run ragas metrics for evaluating RAG](rag_evaluation.md)
224-
- [Generate test data for evaluating RAG](rag_testset_generation.md)
223+
- [Evaluate a simple RAG application](rag_eval.md)

β€Ždocs/getstarted/rag_eval.gif

13.8 MB
Loading

β€Ždocs/getstarted/rag_eval.md

+202
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,202 @@
1+
# Evaluate a simple RAG system
2+
3+
The purpose of this guide is to illustrate a simple workflow for testing and evaluating a RAG system with `ragas`. It assumes minimum knowledge in building RAG system and evaluation. Please refer to our [installation instruction](./install.md) for installing `ragas`.
4+
5+
## Basic Setup
6+
7+
We will use `langchain_openai` to set the LLM and embedding model for building our simple RAG. You may choose any other LLM and embedding model of your choice, to do that please refer to [customizing models in langchain](https://python.langchain.com/docs/integrations/chat/).
8+
9+
10+
```python
11+
from langchain_openai import ChatOpenAI
12+
from langchain_openai import OpenAIEmbeddings
13+
llm = ChatOpenAI(model="gpt-4o")
14+
embeddings = OpenAIEmbeddings()
15+
```
16+
17+
### Build a Simple RAG System
18+
19+
To build a simple RAG system, we need to define the following components:
20+
21+
- Define a method to vectorize our docs
22+
- Define a method to retrieve the relevant docs
23+
- Define a method to generate the response
24+
25+
??? note "Click to View the Code"
26+
27+
```python
28+
29+
import numpy as np
30+
31+
class RAG:
32+
def __init__(self, model="gpt-4o"):
33+
self.llm = ChatOpenAI(model=model)
34+
self.embeddings = OpenAIEmbeddings()
35+
self.doc_embeddings = None
36+
self.docs = None
37+
38+
def load_documents(self, documents):
39+
"""Load documents and compute their embeddings."""
40+
self.docs = documents
41+
self.doc_embeddings = self.embeddings.embed_documents(documents)
42+
43+
def get_most_relevant_docs(self, query):
44+
"""Find the most relevant document for a given query."""
45+
if not self.docs or not self.doc_embeddings:
46+
raise ValueError("Documents and their embeddings are not loaded.")
47+
48+
query_embedding = self.embeddings.embed_query(query)
49+
similarities = [
50+
np.dot(query_embedding, doc_emb)
51+
/ (np.linalg.norm(query_embedding) * np.linalg.norm(doc_emb))
52+
for doc_emb in self.doc_embeddings
53+
]
54+
most_relevant_doc_index = np.argmax(similarities)
55+
return [self.docs[most_relevant_doc_index]]
56+
57+
def generate_answer(self, query, relevant_doc):
58+
"""Generate an answer for a given query based on the most relevant document."""
59+
prompt = f"question: {query}\n\nDocuments: {relevant_doc}"
60+
messages = [
61+
("system", "You are a helpful assistant that answers questions based on given documents only."),
62+
("human", prompt),
63+
]
64+
ai_msg = self.llm.invoke(messages)
65+
return ai_msg.content
66+
```
67+
68+
### Load Documents
69+
Now, let's load some documents and test our RAG system.
70+
71+
```python
72+
sample_docs = [
73+
"Albert Einstein proposed the theory of relativity, which transformed our understanding of time, space, and gravity.",
74+
"Marie Curie was a physicist and chemist who conducted pioneering research on radioactivity and won two Nobel Prizes.",
75+
"Isaac Newton formulated the laws of motion and universal gravitation, laying the foundation for classical mechanics.",
76+
"Charles Darwin introduced the theory of evolution by natural selection in his book 'On the Origin of Species'.",
77+
"Ada Lovelace is regarded as the first computer programmer for her work on Charles Babbage's early mechanical computer, the Analytical Engine."
78+
]
79+
```
80+
81+
```python
82+
# Initialize RAG instance
83+
rag = RAG()
84+
85+
# Load documents
86+
rag.load_documents(sample_docs)
87+
88+
# Query and retrieve the most relevant document
89+
query = "Who introduced the theory of relativity?"
90+
relevant_doc = rag.get_most_relevant_docs(query)
91+
92+
# Generate an answer
93+
answer = rag.generate_answer(query, relevant_doc)
94+
95+
print(f"Query: {query}")
96+
print(f"Relevant Document: {relevant_doc}")
97+
print(f"Answer: {answer}")
98+
```
99+
100+
101+
Output:
102+
```
103+
Query: Who introduced the theory of relativity?
104+
Relevant Document: ['Albert Einstein proposed the theory of relativity, which transformed our understanding of time, space, and gravity.']
105+
Answer: Albert Einstein introduced the theory of relativity.
106+
```
107+
108+
## Collect Evaluation Data
109+
110+
To collect evaluation data, we first need a set of queries to run against our RAG. We can run the queries through the RAG system and collect the `response`, `retrieved_contexts`for each query. You may also optionally prepare a set of golden answers for each query to evaluate the system's performance.
111+
112+
113+
114+
```python
115+
116+
117+
sample_queries = [
118+
"Who introduced the theory of relativity?",
119+
"Who was the first computer programmer?",
120+
"What did Isaac Newton contribute to science?",
121+
"Who won two Nobel Prizes for research on radioactivity?",
122+
"What is the theory of evolution by natural selection?"
123+
]
124+
125+
expected_responses = [
126+
"Albert Einstein proposed the theory of relativity, which transformed our understanding of time, space, and gravity.",
127+
"Ada Lovelace is regarded as the first computer programmer for her work on Charles Babbage's early mechanical computer, the Analytical Engine.",
128+
"Isaac Newton formulated the laws of motion and universal gravitation, laying the foundation for classical mechanics.",
129+
"Marie Curie was a physicist and chemist who conducted pioneering research on radioactivity and won two Nobel Prizes.",
130+
"Charles Darwin introduced the theory of evolution by natural selection in his book 'On the Origin of Species'."
131+
]
132+
```
133+
134+
```python
135+
dataset = []
136+
137+
for query,reference in zip(sample_queries,expected_responses):
138+
139+
relevant_docs = rag.get_most_relevant_docs(query)
140+
response = rag.generate_answer(query, relevant_docs)
141+
dataset.append(
142+
{
143+
"user_input":query,
144+
"retrieved_contexts":relevant_docs,
145+
"response":response,
146+
"reference":reference
147+
}
148+
)
149+
```
150+
151+
Now, load the dataset into `EvaluationDataset` object.
152+
153+
```python
154+
from ragas import EvaluationDataset
155+
evaluation_dataset = EvaluationDataset.from_list(dataset)
156+
```
157+
158+
## Evaluate
159+
160+
We have successfully collected the evaluation data. Now, we can evaluate our RAG system on the collected dataset using a set of commonly used RAG evaluation metrics. You may choose any model as [evaluator LLM](/docs/howtos/customizations/customize_models.md) for evaluation.
161+
162+
```python
163+
from ragas import evaluate
164+
from ragas.llms import LangchainLLMWrapper
165+
166+
167+
evaluator_llm = LangchainLLMWrapper(llm)
168+
from ragas.metrics import LLMContextRecall, LLMContextPrecisionWithReference, Faithfulness, FactualCorrectness
169+
170+
result = evaluate(dataset=evaluation_dataset,metrics=[LLMContextRecall(), Faithfulness(), FactualCorrectness()],llm=evaluator_llm)
171+
result
172+
```
173+
174+
Output
175+
```
176+
{'context_recall': 1.0000, 'faithfulness': 0.8571, 'factual_correctness': 0.7280}
177+
```
178+
179+
## Analyze Results
180+
181+
Once you have evaluated, you may want to view, analyse and share results. This is important to interpret the results and understand the performance of your RAG system. For this you may sign up and setup [app.ragas.io]() easily. If not, you may use any alternative tools available to you.
182+
183+
In order to use the [app.ragas.io](http://app.ragas.io) dashboard, you need to have an account on [app.ragas.io](https://app.ragas.io/). If you don't have one, you can sign up for one [here](https://app.ragas.io/login). You will also need to generate a [Ragas API key](https://app.ragas.io/dashboard/settings/app-tokens).
184+
185+
Once you have the API key, you can use the `upload()` method to export the results to the dashboard.
186+
187+
```python
188+
import os
189+
os.environ["RAGAS_API_KEY"] = "your_api_key"
190+
```
191+
192+
Now you can view the results in the dashboard by following the link in the output of the `upload()` method.
193+
194+
```python
195+
results.upload()
196+
```
197+
198+
![](rag_eval.gif)
199+
200+
## Up Next
201+
202+
- [Generate test data for evaluating RAG](rag_testset_generation.md)

β€Ždocs/index.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ Ragas is a library that provides tools to supercharge the evaluation of Large La
99

1010
Install with `pip` and get started with Ragas with these tutorials.
1111

12-
[:octicons-arrow-right-24: Get Started](getstarted/index.md)
12+
[:octicons-arrow-right-24: Get Started](getstarted/evals.md)
1313

1414
- πŸ“š **Core Concepts**
1515

β€Žmkdocs.yml

+2-2
Original file line numberDiff line numberDiff line change
@@ -10,8 +10,8 @@ nav:
1010
- πŸš€ Get Started:
1111
- getstarted/index.md
1212
- Installation: getstarted/install.md
13-
- Evaluate your first AI app: getstarted/evals.md
14-
- Evaluate Using Metrics: getstarted/rag_evaluation.md
13+
- Evaluate your first LLM App: getstarted/evals.md
14+
- Evaluate a simple RAG: getstarted/rag_eval.md
1515
- Generate Synthetic Testset for RAG: getstarted/rag_testset_generation.md
1616
- πŸ“š Core Concepts:
1717
- concepts/index.md

0 commit comments

Comments
Β (0)