-
Notifications
You must be signed in to change notification settings - Fork 993
Description
[ ] I have checked the documentation and related resources and couldn't resolve my bug.
Describe the bug
I am getting the following error when manually creating SingleTurnSamples from my dataset.
("All samples must be of the same type")
How to find the data frame that contributes to a mismatched sample record?
Ragas version: 0.2.12
Python version: 3.9
Code to Reproduce
import pprint
import re
import pandas as pd
from langchain_openai import AzureChatOpenAI
from langchain_openai import AzureOpenAIEmbeddings
from ragas import SingleTurnSample, EvaluationDataset
from ragas import evaluate
from ragas.embeddings import LangchainEmbeddingsWrapper
from ragas.llms import LangchainLLMWrapper
from ragas.metrics import LLMContextRecall, Faithfulness, FactualCorrectness, SemanticSimilarity
from urllib.request import urlopen
from bs4 import BeautifulSoup
from concurrent.futures import ThreadPoolExecutor, as_completed
def empty_nan_value(cell_value):
return '' if pd.isna(cell_value) else cell_value
def create_turn_sample(row):
url = re.split(r'[,\n ]+', empty_nan_value(row['reference']))[0]
page = ''
try:
page = urlopen(url)
except ValueError:
return
soup = BeautifulSoup(page, features='lxml')
return SingleTurnSample(
user_input=row['user_input'],
retrieved_contexts=[empty_nan_value(row['context1']), empty_nan_value(row['context2']),
empty_nan_value(row['context3']),
empty_nan_value(row['context4'])],
response=empty_nan_value(row['response']),
reference=soup.get_text())
df = pd.read_excel("Test Automation Result.xlsx")
with ThreadPoolExecutor(max_workers=50) as executor:
future_to_row = {
executor.submit(create_turn_sample, row): index for index, (idx, row) in enumerate(df.iterrows(), start=0)
}
samples = []
for future in as_completed(future_to_row):
status = future_to_row[future]
samples.append(future.result())
pprint.pprint(samples)
eval_dataset = EvaluationDataset(samples)
# other configuration
azure_config = {
"base_url": <BASE_URL>,
"model_deployment": <DEPLOYMENT_NAME>
"model_name": "gpt-4o" # your model name
}
evaluator_llm = LangchainLLMWrapper(AzureChatOpenAI(
openai_api_version="2024-08-01-preview",
azure_endpoint=azure_config["base_url"],
azure_deployment=azure_config["model_deployment"],
model=azure_config["model_name"],
validate_base_url=False,
))
metrics = [
LLMContextRecall(llm=evaluator_llm),
FactualCorrectness(llm=evaluator_llm),
Faithfulness(llm=evaluator_llm)
]
results = evaluate(dataset=eval_dataset, metrics=metrics)
pprint.pprint(results)
df = results.to_pandas()
df.head()
I can't share the excel sheet itself due to privacy reasons
Error trace
Traceback (most recent call last):
File "<HOME_DIR>/Desktop/rag_eval.py", line 53, in <module>
eval_dataset = EvaluationDataset(samples)
File "<string>", line 4, in __init__
File "<PROJECT_DIR>/.venv/lib/python3.9/site-packages/ragas/dataset_schema.py", line 173, in __post_init__
self.samples = self.validate_samples(self.samples)
File "<PROJECT_DIR>/.venv/lib/python3.9/site-packages/ragas/dataset_schema.py", line 193, in validate_samples
raise ValueError("All samples must be of the same type")
ValueError: All samples must be of the same type
Expected behavior
I expected the samples to be created properly and evaluation to start
Additional context
Please help me fix the troublesome sample record and where the problem is. At the moment, this error message in itself is not very helpful to spot the error among the many sample records created