Skip to content

Testset Generation: bringing continual learning to RAG pipelines #136

@jjmachan

Description

@jjmachan

We started ragas with ground-truthfree evaluations so that you didn't have to put significant upfront effort into building an ideal test set before running evaluations. Creating a test set needs substantial upfront investment in time, money, human hours and expertise to get it right. It is also a continuous process as your product and ML model evolve to cater to diverse use cases. We are exploring the possibilities of synthetic test set generation because

  1. As RAG users get more mature and go into production, having a solid test set and evaluation strategies becomes critical to give users a seamless experience. This means they have to put more time into building solid test sets and evaluation methodologies.
  2. ground-truth free evaluation has its limitations. It is very effective in quantifying aspects like faithfulness but ground-truth free evaluation cannot be used to ensure aspects like answer correctness which is also very important. Here a synthetic test set with ground-truth can be of high utility.

The whole focus of the Ragas library is to help you build more reliable RAG applications which is why with the next leg of Ragas we'll be focusing a lot more on test set generation and continual learning of RAG pipelines. The goal is to leverage custom LLMs and Data-Centric AI techniques to

  1. Build more robust paradigms for test set generation.
    1. Many libraries already have some sort of test set generation but they have a few shortcomings. Ideally, the test set should have a good distribution of easy -> hard questions across different tasks/situations as seen in production.
  2. Tools to scale up and reduce the cost of test set generation.
    1. Works like self-instruct and evol-instruct have proven that LLMs can generate human-quality synthetic data. We are working on paradigms to generate high-quality synthetic data generation specific to RAG. Ref [1] [2]
  3. Methodologies to continuously add to and improve the test set as your RAG pipelines evolve using other data points like logs and feedback.

there is a lot of work to be done but with the v0.1 release of Ragas, we'll be releasing features in this direction. In the meantime, we would love to hear your opinions, expectations, suggestions and ideas about this too :)

Team Ragas

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions