Skip to content

the generated testset is empty #1769

@Kevinddddddd

Description

@Kevinddddddd

[ ] I checked the documentation and related resources and couldn't find an answer to my question.

Your Question
rags version: 0.28.0
I use the script in the documentation to generate non-english testset, but the output is empty.

Code Examples

chat_model = AzureChatOpenAI(
    api_version=api_version,
    model="gpt-4o",
    azure_deployment="gpt-4o"
)
embedding_model = AzureOpenAIEmbeddings(
    model="text-embedding-3-large-turing",
    api_version=api_version,
    azure_deployment="text-embedding-3-large-turing"
)
generator_llm = LangchainLLMWrapper(chat_model)
generator_embeddings = LangchainEmbeddingsWrapper(embedding_model)

personas = [
    Persona(
        name="curious student",
        role_description="A student who is curious about the world and wants to learn more about different cultures and languages",
    ),
]

generator = TestsetGenerator(
    llm=generator_llm, embedding_model=generator_embeddings, persona_list=personas
)

distribution = [
    (SingleHopSpecificQuerySynthesizer(llm=generator_llm), 1.0),
]

path = "/data/eco_rag/testdata"
loader = DirectoryLoader(path, loader_cls=TextLoader, show_progress=True)
docs = loader.load()

for query, _ in distribution:
    prompts = await query.adapt_prompts("spanish", llm=generator_llm)
    query.set_prompts(**prompts)

dataset = generator.generate_with_langchain_docs(
    docs,
    testset_size=3,
    query_distribution=distribution
)

Additional context
the output log is here:
100%|██████████| 1/1 [00:00<00:00, 730.59it/s]
Applying HeadlinesExtractor: 0%| | 0/1 [00:00<?, ?it/s]
Property 'summary' already exists in node 'e425ec'. Skipping!
Property 'summary_embedding' already exists in node 'e425ec'. Skipping!
Generating Scenarios: 0%| | 0/1 [00:00<?, ?it/s]
Generating Samples: 0it [00:00, ?it/s]

the document is only one downloaded from https://huggingface.co/datasets/explodinggradients/Sample_non_english_corpus
截屏2024-12-18 16 40 59

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingmodule-testsetgenModule testset generationquestionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions