You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
3. **SecretVault:** SecretVault stores the blinded chunks and embeddings
90
104
provided by data owners. When a client submits a query, SecretVault computes
91
105
the differences between the query's embeddings and each stored embedding in a
92
-
privacy-preserving manner.
106
+
privacy-preserving manner. If clustering is enabled, SecretVault also stores the
107
+
cluster centroids in a separate schema. In the original schema, the blinded chunks
108
+
and embeddings are stored along with the corresponding centroid.
93
109
94
110
95
111
4. **SecretLLM:** SecretLLM connects to SecretVault to fetch the blinded
96
112
differences between the query and the stored embeddings and then compute the
97
-
closest matches. Finally, it uses the top k matches for inference.
113
+
closest matches. If clustering is enabled, SecretLLM starts by retrieving the
114
+
centroid points. Finally, it uses the top k matches for inference.
98
115
99
116
Lastly, the client can query SecretLLM asking about Danielle:
100
117
:::note Employees Example
@@ -117,17 +134,76 @@ enhance the inference with context that has been uploaded to [SecretVault](https
117
134
118
135
### Performance Expectations
119
136
120
-
We have performed a series of benchmarks to evaluate the performance of nilRAG.
121
-
Currently, nilRAG scales linearly to the number of rows stored in nilDB.
122
-
The following table shows latency to upload to nilDB multiple paragraphs of a few sentences long, as well as the runtime for AI inference using SecretLLM with nilRAG.
123
-
124
-
| Number of Paragraphs Stored in nilDB | Upload Time to nilDB (sec.) | Query Time (Inference + RAG) (sec.) |
We have performed a series of benchmarks to evaluate the performance of nilRAG with and without clustering.
138
+
Currently, nilRAG scales linearly to the number of rows stored in SecretVault.
139
+
The following table shows latency to upload to SecretVault multiple paragraphs of a few sentences long, as well as the runtime for AI inference using SecretLLM with nilRAG.
140
+
141
+
<table>
142
+
<thead>
143
+
<tr>
144
+
<th rowspan="2">Number of Paragraphs Stored <br> in SecretVault</th>
145
+
<th colspan="2">RAG Time (sec.)</th>
146
+
<th colspan="2">Query Time (Inference + RAG, sec.)</th>
147
+
</tr>
148
+
<tr>
149
+
<th>No Clusters</th>
150
+
<th>5 Clusters</th>
151
+
<th>No <br> Clusters</th>
152
+
<th>5 <br> Clusters</th>
153
+
</tr>
154
+
</thead>
155
+
<tbody>
156
+
<tr>
157
+
<td>1</td>
158
+
<td>0.2</td>
159
+
<td> - </td>
160
+
<td>2.4</td>
161
+
<td> - </td>
162
+
</tr>
163
+
<tr>
164
+
<td>10</td>
165
+
<td>0.4</td>
166
+
<td> - </td>
167
+
<td>3.1</td>
168
+
<td> - </td>
169
+
</tr>
170
+
<tr>
171
+
<td>100</td>
172
+
<td>2.3 </td>
173
+
<td> 1.7 </td>
174
+
<td>2.9</td>
175
+
<td> 2.1 </td>
176
+
</tr>
177
+
<tr>
178
+
<td>1 000</td>
179
+
<td>5.8</td>
180
+
<td>2.5</td>
181
+
<td>7.0</td>
182
+
<td>3.2</td>
183
+
</tr>
184
+
<tr>
185
+
<td>5 000</td>
186
+
<td>20.0</td>
187
+
<td>5.7</td>
188
+
<td>25.1</td>
189
+
<td>5.9</td>
190
+
</tr>
191
+
<tr>
192
+
<td>10 000</td>
193
+
<td>39.2</td>
194
+
<td>10.0</td>
195
+
<td>47.5</td>
196
+
<td>8.9</td>
197
+
</tr>
198
+
<tr>
199
+
<td>20 000</td>
200
+
<td>74.7</td>
201
+
<td>11.3</td>
202
+
<td>92.5</td>
203
+
<td>19.8</td>
204
+
</tr>
205
+
</tbody>
206
+
</table>
131
207
132
208
Additionally, using multiple concurrent users, the query time for inference with nilRAG increases.
133
209
Performing inference with nilRAG with a content of 100 paragraphs takes approximately 5 seconds for a single user, while with ten concurrent users the inference time for the same content goes up to almost 9 seconds.
0 commit comments