-
Notifications
You must be signed in to change notification settings - Fork 993
Description
We started ragas with ground-truth
free evaluations so that you didn't have to put significant upfront effort into building an ideal test set before running evaluations. Creating a test set needs substantial upfront investment in time, money, human hours and expertise to get it right. It is also a continuous process as your product and ML model evolve to cater to diverse use cases. We are exploring the possibilities of synthetic test set generation because
- As RAG users get more mature and go into production, having a solid test set and evaluation strategies becomes critical to give users a seamless experience. This means they have to put more time into building solid test sets and evaluation methodologies.
- ground-truth free evaluation has its limitations. It is very effective in quantifying aspects like faithfulness but ground-truth free evaluation cannot be used to ensure aspects like answer correctness which is also very important. Here a synthetic test set with ground-truth can be of high utility.
The whole focus of the Ragas library is to help you build more reliable RAG applications which is why with the next leg of Ragas we'll be focusing a lot more on test set generation and continual learning of RAG pipelines. The goal is to leverage custom LLMs and Data-Centric AI techniques to
- Build more robust paradigms for test set generation.
- Many libraries already have some sort of test set generation but they have a few shortcomings. Ideally, the test set should have a good distribution of easy -> hard questions across different tasks/situations as seen in production.
- Tools to scale up and reduce the cost of test set generation.
- Methodologies to continuously add to and improve the test set as your RAG pipelines evolve using other data points like logs and feedback.
there is a lot of work to be done but with the v0.1 release of Ragas, we'll be releasing features in this direction. In the meantime, we would love to hear your opinions, expectations, suggestions and ideas about this too :)
Team Ragas