How can I reliably track which persona was used for each generated question in a testset?

[x] I checked the [documentation](https://docs.ragas.io/) and related resources and couldn't find an answer to my question.

---
### Question:
Hello guys,

I'm using the **TestsetGenerator** with a list of personas to generate evaluation samples for my dataset. However, I noticed that after generation, there is no reliable way to know **which persona** was used to generate **each question** in the resulting testset. This information is crucial for general analysis and insights, as I need to correlate the generated questions and references with the specific persona that guided their creation.

---
I have tried the following approaches:

1.  Adding a custom field (e.g., persona_name) to the SingleTurnSample via a custom synthesizer, but this field is not preserved in the final testset or when using .to_pandas()/.to_list().
 ```python
from ragas.testset.synthesizers.single_hop.specific import (
    SingleHopSpecificQuerySynthesizer,
)
from ragas.testset.persona import Persona
from ragas.testset.graph import Node
from ragas.dataset_schema import SingleTurnSample
from pydantic import Field


class PersonaSingleTurnSample(SingleTurnSample):
    persona_name: str = Field(default=None)


class PersonaAwareSingleHopSpecificQuerySynthesizer(SingleHopSpecificQuerySynthesizer):
    def _generate_question(
        self, node: Node, persona: Persona
    ) -> PersonaSingleTurnSample:
        original_sample = super()._generate_question(node, persona)
        return PersonaSingleTurnSample(
            user_input=original_sample.user_input,
            retrieved_contexts=original_sample.retrieved_contexts,
            reference_contexts=original_sample.reference_contexts,
            response=original_sample.response,
            multi_responses=original_sample.multi_responses,
            reference=original_sample.reference,
            rubrics=original_sample.rubrics,
            persona_name=persona.name,
        )

 ```
2. Prefixing the persona name in the user_input string and then extracting it via regex, but this feels like a workaround rather than a robust solution and degrades the quality of the prompt.
```python
def _generate_question(self, node: Node, persona: Persona) -> SingleTurnSample:
    original_sample = super()._generate_question(node, persona)
    return SingleTurnSample(
        user_input=f"[{persona.name}]: {original_sample.user_input}",
...
```
3. Attempting to associate personas by order or count, but the generator does not guarantee any alignment between the persona list and the generated samples.

---

- **Is there an official or recommended way to reliably track and retrieve the persona used for each generated question/sample in the testset?**

Thank you for your help and for this great library!


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How can I reliably track which persona was used for each generated question in a testset? #2066

Question:

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

How can I reliably track which persona was used for each generated question in a testset? #2066

Description

Question:

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions