Skip to content

Commit ab4f1b5

Browse files
authored
Merge pull request #10 from pfilipovich/section_4_validation_schema
Section 4 validation schema
2 parents fd83a25 + 8a89cd9 commit ab4f1b5

File tree

1 file changed

+69
-1
lines changed

1 file changed

+69
-1
lines changed

Design_Doc_Examples/RAG_Q&A_for collaborative_work_platform.md

Lines changed: 69 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -95,7 +95,75 @@ For some Documents we do not know the diff. Only know how Document looked like a
9595

9696
### **IV. Validation Schema**
9797

98-
No ideas
98+
For validation purposes, we will use a data set generated from the original documents using the RAGAS functionality. This approach allows us to create a comprehensive validation set that closely mirrors the real-world usage of our system.
99+
100+
#### i. Question Selection and Dataset Creation
101+
RAGAS takes the original documents and their associated metadata and generates a structured dataset with the following components
102+
103+
* Question: Simulation of user queries
104+
* Context: Relevant parts of the document(s)
105+
* Answer: The expected answer
106+
107+
This structure allows us to evaluate both the retrieval and generation aspects of our RAG system.
108+
109+
To create a comprehensive and representative validation dataset, we'll employ a multi-faceted approach to question selection:
110+
111+
1. Automated Question Generation
112+
* Use natural language processing (NLP) techniques to automatically generate questions from the documents.
113+
* Apply techniques such as named entity recognition, key phrase extraction and syntactic parsing to identify potential question targets.
114+
* Use question generation models (e.g. T5 or BART fine-tuned for question generation) to create different types of questions.
115+
116+
2. Human-in-the-Loop Curation
117+
* Engage subject matter experts to review and refine auto-generated questions.
118+
* Have experts create additional questions, especially for complex scenarios or edge cases that automated systems might miss.
119+
* Ensure questions cover various difficulty levels and reasoning types.
120+
121+
3. Real User Query Mining
122+
* Analyse logs of actual user queries (if available) to identify common question patterns and topics.
123+
* Include anonymised versions of real user questions in the dataset to ensure relevance to actual use cases.
124+
125+
4. Question Diversity. Ensure a balanced distribution of question types:
126+
* Factual questions (e.g. "Who is the author of this document?")
127+
* Inferential questions (e.g. "What are the implications of the findings in section 3?)
128+
* Comparative questions (e.g. "How does the methodology in version 2 differ from that in version 1?)
129+
* Multi-document questions (e.g. "Summarise the common themes across these three related documents.)
130+
* Version-specific questions (e.g. "What changes have been made to the conclusion between versions 3 and 4?)
131+
132+
5. Context Selection
133+
* For each question, select a relevant context from the document(s).
134+
* Include both perfectly matching contexts and partially relevant contexts to test the system's ability to handle nuanced scenarios.
135+
136+
6. Answer Generation
137+
* Generate a gold standard answer for each question-context pair.
138+
* Use a combination of automated methods and human expert review to ensure answer quality
139+
140+
7. Metadata Inclusion
141+
* Include relevant metadata for each question-context-answer triplet, such as document version, page numbers or section headings.
142+
143+
8. Edge Case Scenarios
144+
* Deliberately include edge cases, such as questions about rare document types or extremely long documents.
145+
* Create questions that require an understanding of document structure, such as tables of contents or footnotes.
146+
147+
9. Negative Examples
148+
* Include some questions that cannot be answered from the given context to test the system's ability to recognise when it doesn't have sufficient information.
149+
150+
151+
#### ii. Periodic Updates
152+
The validation dataset will be updated periodically to maintain its relevance and comprehensiveness. This includes:
153+
154+
* Addition of newly uploaded documents
155+
* Including new versions of existing documents
156+
* Updating the question set to reflect evolving user needs
157+
158+
We recommend updating the validation set monthly or whenever there's a significant influx of new documents or versions.
159+
160+
#### iii. Stratified Sampling
161+
To ensure balanced representation, we'll use stratified sampling when creating the validation set. Strata may include:
162+
163+
* Document length (short, medium, long)
164+
* Document type (text, scanned image)
165+
* Topic areas
166+
* Query complexity (simple factual, multi-step reasoning, version comparison)
99167

100168
- **Key Takeaways:**
101169
1. The selection of a validation schema is crucial for accurately measuring a model's performance on unseen data, requiring careful consideration of the specific characteristics of the dataset and the problem at hand.

0 commit comments

Comments
 (0)