Skip to content

Commit 44d8100

Browse files
manel1874jfdreis
authored andcommitted
docs: include clustering into nilRAG docs and update performance table
1 parent 5d296c8 commit 44d8100

File tree

2 files changed

+99
-22
lines changed

2 files changed

+99
-22
lines changed

docs/build/nilRAG.md

Lines changed: 98 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
Retrieval Augmented Generation (RAG) is a technique that grants large language
44
models (LLMs) information retrieval capabilities and context that they might be
5-
missing. Nillion's RAG (nilRAG) uses [SecretLLM)](/build/secretLLM/overview), [SecretVault](/build/secret-vault), and the
5+
missing. Nillion's RAG (nilRAG) uses [SecretLLM](/build/secretLLM/overview), [SecretVault](/build/secret-vault), and the
66
[nilQL](/build/nilQL) encryption library.
77

88
:::info
@@ -23,6 +23,13 @@ their information to SecretVault, while SecretLLM processes client queries and
2323
retrieves the most relevant results (top-k) without revealing sensitive
2424
information from either party.
2525

26+
nilRAG supports optional clustering to accelerate query retrieval. Data owners
27+
locally partition their dataset into clusters, then upload the clusters to
28+
SecretVault. At query time, SecretLLM first identifies the most relevant cluster
29+
for the incoming query embedding and then executes RAG within that subset.
30+
By minimizing the search space, this approach reduces comparison overhead and
31+
significantly speeds up inference.
32+
2633

2734
Let us deep dive into the entities and their roles in the system.
2835

@@ -32,7 +39,9 @@ Let us deep dive into the entities and their roles in the system.
3239
search, while the chunks are used to retrieve the actual uploaded files. Once
3340
the files are encoded into chunks and embeddings, they are blinded before
3441
being uploaded to SecretVault, where each chunk and embedding is
35-
secret-shared.
42+
secret-shared. Optionally, data owners can locally partition their data
43+
into clusters and upload the chunks and embeddings along with the
44+
corresponding cluster information to SecretVault.
3645

3746
For instance, a data owner, wishes to upload the following file to SecretVault and later use it to provide context to SecretLLM:
3847
:::note Employees Example
@@ -46,20 +55,24 @@ Let us deep dive into the entities and their roles in the system.
4655
```
4756
:::
4857
49-
Let's dive a bit more into the example of employees records. First, Data
50-
Owners need to create a schema and a query in SecretVault:
58+
Let's dive a bit more into the example of employees records. First, data
59+
owners need to create a schema and a query in SecretVault. If clustering is enabled,
60+
data owners also create a clusters' schema to store the centroids of
61+
the clusters.
5162
<details>
5263
<summary>Full 1.init_schema_query.py</summary>
5364
```py reference showGithubLink
54-
https://github.com/NillionNetwork/nilrag/blob/main/examples/1.init_schema_query.py
65+
https://github.com/NillionNetwork/nilrag/blob/main/examples/init/bootstrap.py
5566
```
5667
</details>
5768
58-
Now that the schema and the query are ready, Data Owners can upload their data:
69+
Now that the schemas and the query are ready, data owners can upload their data. If clustering is enabled,
70+
data owners start by locally computing the clusters centroids using
71+
[scikit-learn KMeans](https://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html) method.
5972
<details>
6073
<summary>Full 2.data_owner_upload.py</summary>
6174
```py reference showGithubLink
62-
https://github.com/NillionNetwork/nilrag/blob/main/examples/2.data_owner_upload.py
75+
https://github.com/NillionNetwork/nilrag/blob/main/examples/data_owner/write.py
6376
```
6477
</details>
6578
@@ -68,7 +81,8 @@ Let us deep dive into the entities and their roles in the system.
6881
uploaded files in SecretVault, retrieve the most relevant data, and use the
6982
top-k results for privacy-preserving inference in SecretLLM. Similar to the
7083
encoding by data owners, the query is processed into its corresponding
71-
embeddings.
84+
embeddings. If clustering is enabled, the most relevant cluster is first
85+
identified and RAG is executed over this cluster.
7286
7387
Going back to our example, the client can query SecretLLM asking about Danielle:
7488
:::note Employees Example
@@ -81,20 +95,23 @@ Let us deep dive into the entities and their roles in the system.
8195
<details>
8296
<summary>Full 3.client_query.py</summary>
8397
```py reference showGithubLink
84-
https://github.com/NillionNetwork/nilrag/blob/main/examples/3.client_query.py
98+
https://github.com/NillionNetwork/nilrag/blob/main/examples/client/query.py
8599
```
86100
</details>
87101
88102
89103
3. **SecretVault:** SecretVault stores the blinded chunks and embeddings
90104
provided by data owners. When a client submits a query, SecretVault computes
91105
the differences between the query's embeddings and each stored embedding in a
92-
privacy-preserving manner.
106+
privacy-preserving manner. If clustering is enabled, SecretVault also stores the
107+
cluster centroids in a separate schema. In the original schema, the blinded chunks
108+
and embeddings are stored along with the corresponding centroid.
93109
94110
95111
4. **SecretLLM:** SecretLLM connects to SecretVault to fetch the blinded
96112
differences between the query and the stored embeddings and then compute the
97-
closest matches. Finally, it uses the top k matches for inference.
113+
closest matches. If clustering is enabled, SecretLLM starts by retrieving the
114+
centroid points. Finally, it uses the top k matches for inference.
98115
99116
Lastly, the client can query SecretLLM asking about Danielle:
100117
:::note Employees Example
@@ -117,17 +134,76 @@ enhance the inference with context that has been uploaded to [SecretVault](https
117134
118135
### Performance Expectations
119136
120-
We have performed a series of benchmarks to evaluate the performance of nilRAG.
121-
Currently, nilRAG scales linearly to the number of rows stored in nilDB.
122-
The following table shows latency to upload to nilDB multiple paragraphs of a few sentences long, as well as the runtime for AI inference using SecretLLM with nilRAG.
123-
124-
| Number of Paragraphs Stored in nilDB | Upload Time to nilDB (sec.) | Query Time (Inference + RAG) (sec.) |
125-
| -------------- | ------------------ | ----------------- |
126-
| 1 | 0.2 | 2.4 |
127-
| 10 | 0.4 | 3.1 |
128-
| 100 | 1.0 | 5.8 |
129-
| 1000 | 10.5 | 13.2 |
130-
| 10000 | 51.3 | 21.9 |
137+
We have performed a series of benchmarks to evaluate the performance of nilRAG with and without clustering.
138+
Currently, nilRAG scales linearly to the number of rows stored in SecretVault.
139+
The following table shows latency to upload to SecretVault multiple paragraphs of a few sentences long, as well as the runtime for AI inference using SecretLLM with nilRAG.
140+
141+
<table>
142+
<thead>
143+
<tr>
144+
<th rowspan="2">Number of Paragraphs Stored <br> in SecretVault</th>
145+
<th colspan="2">RAG Time (sec.)</th>
146+
<th colspan="2">Query Time (Inference + RAG, sec.)</th>
147+
</tr>
148+
<tr>
149+
<th>No Clusters</th>
150+
<th>5 Clusters</th>
151+
<th>No <br> Clusters</th>
152+
<th>5 <br> Clusters</th>
153+
</tr>
154+
</thead>
155+
<tbody>
156+
<tr>
157+
<td>1</td>
158+
<td>0.2</td>
159+
<td> - </td>
160+
<td>2.4</td>
161+
<td> - </td>
162+
</tr>
163+
<tr>
164+
<td>10</td>
165+
<td>0.4</td>
166+
<td> - </td>
167+
<td>3.1</td>
168+
<td> - </td>
169+
</tr>
170+
<tr>
171+
<td>100</td>
172+
<td>2.3 </td>
173+
<td> 1.7 </td>
174+
<td>2.9</td>
175+
<td> 2.1 </td>
176+
</tr>
177+
<tr>
178+
<td>1 000</td>
179+
<td>5.8</td>
180+
<td>2.5</td>
181+
<td>7.0</td>
182+
<td>3.2</td>
183+
</tr>
184+
<tr>
185+
<td>5 000</td>
186+
<td>20.0</td>
187+
<td>5.7</td>
188+
<td>25.1</td>
189+
<td>5.9</td>
190+
</tr>
191+
<tr>
192+
<td>10 000</td>
193+
<td>39.2</td>
194+
<td>10.0</td>
195+
<td>47.5</td>
196+
<td>8.9</td>
197+
</tr>
198+
<tr>
199+
<td>20 000</td>
200+
<td>74.7</td>
201+
<td>11.3</td>
202+
<td>92.5</td>
203+
<td>19.8</td>
204+
</tr>
205+
</tbody>
206+
</table>
131207
132208
Additionally, using multiple concurrent users, the query time for inference with nilRAG increases.
133209
Performing inference with nilRAG with a content of 100 paragraphs takes approximately 5 seconds for a single user, while with ten concurrent users the inference time for the same content goes up to almost 9 seconds.

docs/build/secretVault-secretDataAnalytics/create-schema.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -156,6 +156,7 @@ You can create schema collections using the SecretVault Tools UI or programatica
156156
```tsx reference showGithubLink
157157
https://github.com/NillionNetwork/blind-module-examples/blob/main/nildb/secretvault_nextjs_nilql/app/api/create-schema/route.ts#L41-L106
158158
```
159+
159160
</TabItem>
160161
<TabItem value="wrapper" label="JavaScript (with wrapper)">
161162

0 commit comments

Comments
 (0)