-
Notifications
You must be signed in to change notification settings - Fork 989
Open
Labels
bugSomething isn't workingSomething isn't workingmodule-testsetgenModule testset generationModule testset generationquestionFurther information is requestedFurther information is requested
Description
[ ] I checked the documentation and related resources and couldn't find an answer to my question.
Your Question
rags version: 0.28.0
I use the script in the documentation to generate non-english testset, but the output is empty.
Code Examples
chat_model = AzureChatOpenAI(
api_version=api_version,
model="gpt-4o",
azure_deployment="gpt-4o"
)
embedding_model = AzureOpenAIEmbeddings(
model="text-embedding-3-large-turing",
api_version=api_version,
azure_deployment="text-embedding-3-large-turing"
)
generator_llm = LangchainLLMWrapper(chat_model)
generator_embeddings = LangchainEmbeddingsWrapper(embedding_model)
personas = [
Persona(
name="curious student",
role_description="A student who is curious about the world and wants to learn more about different cultures and languages",
),
]
generator = TestsetGenerator(
llm=generator_llm, embedding_model=generator_embeddings, persona_list=personas
)
distribution = [
(SingleHopSpecificQuerySynthesizer(llm=generator_llm), 1.0),
]
path = "/data/eco_rag/testdata"
loader = DirectoryLoader(path, loader_cls=TextLoader, show_progress=True)
docs = loader.load()
for query, _ in distribution:
prompts = await query.adapt_prompts("spanish", llm=generator_llm)
query.set_prompts(**prompts)
dataset = generator.generate_with_langchain_docs(
docs,
testset_size=3,
query_distribution=distribution
)
Additional context
the output log is here:
100%|██████████| 1/1 [00:00<00:00, 730.59it/s]
Applying HeadlinesExtractor: 0%| | 0/1 [00:00<?, ?it/s]
Property 'summary' already exists in node 'e425ec'. Skipping!
Property 'summary_embedding' already exists in node 'e425ec'. Skipping!
Generating Scenarios: 0%| | 0/1 [00:00<?, ?it/s]
Generating Samples: 0it [00:00, ?it/s]
the document is only one downloaded from https://huggingface.co/datasets/explodinggradients/Sample_non_english_corpus
dosubot and RomiconEZ
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workingmodule-testsetgenModule testset generationModule testset generationquestionFurther information is requestedFurther information is requested