You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
3. Updating validation splits in response to new data or changing distributions is essential for maintaining the relevance and accuracy of performance estimates.
149
149
4. Detailed planning and documentation of the chosen validation schemas within the design document are vital for ensuring the evaluation process is aligned with the project's goals and constraints.
150
150
151
-
### **V. Baseline Solution**
152
151
153
-
Regexp/full text search (Elasticsearch) to get Documents and Sentences
154
-
Show to user highlighted parts of document
152
+
### V. Baseline Solution
155
153
156
-
-**Key Takeaways:**
157
-
1. Establishing a baseline is a critical first step in machine learning system design, serving as a simple, operational starting point for iterative improvement.
158
-
2. The choice of baseline should be guided by a trade-off between desired accuracy and the effort required for development, with simplicity often providing significant advantages in terms of robustness, scalability, and interpretability.
159
-
3. In deep learning applications, leveraging pretrained models or training simple models from scratch can provide effective baselines, with the choice influenced by the specific requirements and constraints of the project.
160
-
4. Continuous evaluation and comparison against the baseline are essential for guiding the development process, ensuring that complexity is added only when it yields proportional benefits in performance.
154
+
#### Document Extraction Process
155
+
156
+
Considering the minimal variability among the documents and the sufficient coverage of most cases within the current dataset, it is recommended to implement an in-house document extraction pipeline. This pipeline should consist of:
157
+
158
+
1. File Type Handler: Differentiate and handle file types accordingly, as PDFs and images may require additional processing steps.
159
+
2. Text Extraction: Deploy a customized OCR solution designed to handle non-text elements.
160
+
3. Text Preprocessing: Remove unwanted characters, whitespace, or any artifacts.
161
+
4. Markdown Formatting: Ensure that the extracted content is formatted correctly according to markdown standards.
162
+
5. Error Management & Spell Checking: Integrate an error handler and a spell checker to maintain data quality.
163
+
164
+
#### Retrieval-Augmented Generation Framework
165
+
166
+
The Retrieval-Augmented Generation (RAG) framework can be broken down into two main components:
167
+
168
+
- Retrieval
169
+
- Augmented Generation
170
+
171
+
Augmented Generation is a recent advancement, while the concept of document retrieval is something that has been with us since the emergine of web search. While there is little to no sense in building the second part using solutions other than LLMs, it might make sense to implement a simple baseline for the retrieval.
172
+
173
+
#### Retrieval: Sparse Encoded Retrieval Baseline
174
+
175
+
Objectives:
176
+
- Create a robust baseline with minimal effort.
177
+
- Validate the hypothesis that an enhanced search capability is beneficial.
178
+
- Gather a dataset based on retrieval, incorporating both implicit and explicit feedback for future refinement.
179
+
180
+
Applicability:
181
+
This covers use case `1a`. The solution is not applicable to the use cases `1na` and `2na`, thus also addressed.
182
+
183
+
The system enables content search within documents using the BM25 algorithm.
184
+
185
+
Components:
186
+
1. Preprocessing Layer
187
+
- Tokenizes input data
188
+
- Filters out irrelevant content
189
+
- Applies stemming / lemmatization
190
+
2. Indexing Layer
191
+
- Maintains a DB-represented corpus
192
+
- Creates indexes for Term Frequency (TF) and Inverse Document Frequency (IDF)
193
+
3. Inference Layer
194
+
- Given query passed trough the preprocessing layer, Executes parallelized scoring computations
195
+
- Manages ranking and retrieval of results
196
+
4. Representation Layer
197
+
- Highlights the top-k results for the user
198
+
- Handles an explicit user feedback dialogue ("Have you found what you were looking for?")
199
+
200
+
##### Pros & Cons
201
+
202
+
Pros:
203
+
+ Simple to implement, debug, and analyze
204
+
+ Fast retrieval due to lightweight computation
205
+
+ Scalable, as computation jobs can process document segments independently
206
+
+ Popular, with many optimized implementations available
207
+
+ Low maintenance costs, suitable for junior engineers
208
+
209
+
Cons:
210
+
- No semantic understanding: snonyms are not supported by default
211
+
- Bag-of-words approach: word order is not considered
212
+
- Requires updates to accommodate new vocabulary
213
+
214
+
#### RAG: Baseline Implementation
215
+
216
+
A basic RAG system consists of the following components components:
217
+
218
+
1. Ingestion Layer:
219
+
- Embedder
220
+
- DB-indexing
221
+
2. Retrieval Layer:
222
+
- Embedder
223
+
- DB simularity search
224
+
3. Chat Service:
225
+
- Manages chat context
226
+
- Prompt template constructor: supports diaologs for clarification
227
+
- Stores chat history
228
+
4. Synthesis Component:
229
+
- Utilizes an LLM for response generation
230
+
6. Representation Layer:
231
+
- Provides a dialogue mode for user interaction.
232
+
- User Feedback: Collects user input to continuously refine the system.
233
+
234
+
We have opted to develop an in-house embedder while utilizing API calls to vendor-based LLMs.
235
+
236
+
In-house embedder:
237
+
- Provides potential for improving this critical component without vendor lock-in
238
+
- Offers deterministic behavior
239
+
- Does not require us to provide per-token costs
240
+
- Could potentially benefit from interaction data enhancements
241
+
242
+
Drawbacks:
243
+
- Development and maintenance costs.
244
+
- Per-token costs may not be as optimized as those of larger companies.
245
+
246
+
API-based LLMs:
247
+
248
+
- LLMs are continually improving, particularly in few-shot learning capabilities. We don't want to invest in LLM traning.
249
+
- Competitive market dynamics are driving down the cost of API calls over time
250
+
- Switching vendors involves minimal effort since it only requires switching APIs, allowing for potential utilization of multiple vendors.
251
+
252
+
Drawbacks:
253
+
- Less control over the responses
254
+
- Data privacy (though not a significant concern)
255
+
256
+
We have also selected an open-source framework, LlamaIndex for RAG, which supports the aforementioned design choice and offers many capabilities out of the box, including:
257
+
258
+
1. Document storage
259
+
2. Index storage
260
+
3. Chat service
261
+
4. Modular design for document extraction that supports custom modules
262
+
5. Built-in logging and monitoring capabilities
161
263
162
264
### **VI. Error analysis**
163
265
@@ -221,16 +323,40 @@ SLAs
221
323
3. The release cycle of ML systems presents unique challenges, necessitating a balance between agility and stability. Techniques like blue-green and canary deployments can facilitate safer updates and minimize disruptions.
222
324
4. Operational robustness is achieved not only through technical means such as CI, logging, and monitoring but also by addressing non-technical aspects like compliance and user data management. Overrides and fallbacks are critical for maintaining service continuity and adapting to changes or failures in real-time.
223
325
224
-
### **XI. Monitoring**
326
+
### XI. Monitoring
225
327
226
-
Time to get answer
227
-
Time to first token
228
-
???
328
+
#### Logging
329
+
330
+
1.**Ingestion Layer**: Every step of the ETL pipeline for document extraction must be fully logged to ensure the process is reproducible and help issue resolution.
331
+
332
+
2.**Retrieval**: Logging should save the details of each query, including the tokenizer used, the document context found within a particular document version, and any other relevant metadata that could aid in future analyses.
333
+
334
+
3.**Chat History**: Storing all chat history is crucial for a thorough analysis and debugging process, providing valuable insights into user interactions and system performance
335
+
336
+
#### Monitoring
337
+
338
+
1.**Ingestion Layer**: statistics for documents during ingestion should be monitored, including word count, character distribution, document length, paragraph length, detected languages, and the percentage of tables or images
339
+
340
+
2.**Retirement**:
341
+
-**Embedder**: Monitor preprocessing time, embedding model time, and utilization instances of the embedding model
342
+
-**Database (DB)**: Keep track of the indixes found, similarity scores, and the time taken for each retrieval operation
343
+
344
+
3.**Augmented Generation**: Quality of generated content through user feedback, cost and latency. Furthermore, monitor the volume of generated content to predict scaling needs.
345
+
346
+
4.**System Health Metrics**: Implement continuous monitoring of system health metrics such as CPU usage, memory usage, disk I/O, network I/O, error rates, and uptime to ensure the system is functioning optimally.
347
+
348
+
5.**Alerting Mechanisms**: Build an alerting mechanisms for any anomalies or exceeded thresholds based on the metrics being monitored.
349
+
350
+
351
+
#### Tooling
352
+
353
+
1.**For RAG operations - Langfuse callback**.
354
+
355
+
2.**For System Health Metrics, Ingestion Layer - Prometheus & Grafana**: Prometheus is an open-source system monitoring and alerting toolkit. Grafana is used to visualize the data collected by Prometheus.
356
+
357
+
3.**Code error reports - Sentry.io**: Sentry is a widely-used error tracking tool that helps developers monitor, fix, and optimize application performance.
229
358
230
-
1.**Monitoring is Essential**: Without proper monitoring, even the most sophisticated ML models can fail, highlighting the need for robust monitoring frameworks that include software health, data quality, and model performance.
231
-
2.**Proactive Maintenance**: Proactive strategies in monitoring can mitigate risks associated with data drift and model decay, ensuring that ML systems continue to perform optimally over time.
232
-
3.**Integrated Approach**: Effective monitoring combines traditional software monitoring techniques with new approaches tailored to the nuances of ML systems, integrating data quality checks, performance benchmarks, and business KPIs to create a holistic view of system health.
233
-
4.**Continuous Improvement**: The field of ML monitoring is evolving, necessitating ongoing adjustments to monitoring practices as new challenges and technological advancements arise.
359
+
4.**For alerting mechanism - Prometheus Alertmanager**: Alertmanager handles alerts sent by Prometheus servers and takes care of deduplicating, grouping, and routing them to the correct receiver.
0 commit comments