Skip to content
This repository was archived by the owner on Mar 21, 2024. It is now read-only.

Commit 536d664

Browse files
authored
Update the specification of soft-deleted documents (#206)
* Grammarly'd soft-deletion page * Soft-deleted: update the functional specification
1 parent 1b0fcc6 commit 536d664

File tree

1 file changed

+15
-22
lines changed

1 file changed

+15
-22
lines changed

text/0136-documents-soft-deletion.md

Lines changed: 15 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -2,41 +2,34 @@
22

33
## 1. Summary
44

5-
This specification describes the internals of the documents soft-deletion algorithm.
5+
This specification describes the internals of the document soft-deletion algorithm.
66

77
## 2. Motivation
88

99
Deleting documents is extremely slow and can happen when;
10-
- A user delete a single document.
11-
- A user delete a batch of documents.
12-
- A user update one or multiple documents (i.e., the primary key is the same, but the document's content is not the same).
10+
- A user deletes a single document.
11+
- A user deletes a batch of documents.
12+
- A user updates one or multiple documents (i.e., the primary key is the same, but the document's content is not the same).
1313

14-
The purpose of the documents soft-deletion feature is to make the deletion of documents almost instantaneous by **not** deleting the document when asked.
14+
The purpose of the document soft-deletion feature is to make the deletion of documents almost instantaneous by **not** deleting the document when asked.
1515

1616
## 3. Functional Specification
1717

18-
Instead of deleting the documents, Meilisearch mark them internally as deleted and then exclude them from all the other algorithms of the engine.
19-
That's fast but takes space; thus, at some point, we need to _really_ delete the soft deleted documents.
18+
Instead of deleting the documents, Meilisearch marks them internally as deleted and then excludes them from all the other algorithms of the engine.
19+
That's fast but takes up space; thus, at some point, we need to _really_ delete the soft-deleted documents.
2020

2121
This can happen for two reasons;
22-
- When 90% of the total available space is used.
23-
- When 10% of the total space is dedicated to the soft deleted documents.
22+
1. when there are more soft-deleted documents than regular documents in the database, or
23+
2. when the soft-deleted documents occupy more disk space than a fixed threshold.
2424

25-
The idea is good, but there are two technical issues;
26-
27-
1. We don't know the size a document really occupies.
28-
This means we don't know the size used by the soft deleted documents.
29-
That can be imprecise in the case of a really heterogeneous dataset with large and small documents.
30-
2. We don't know the total available space. The only information available to meilisearch is the `max-index-size` which is by default at 100GB, but meilisearch could be deployed on a smaller disk.
31-
32-
The second point could be a real issue for the case of someone who has very few documents but update them frequently on a small disk without updating the `max-index-size` parameter.
33-
The soft-deleted documents would grow until they use 10GB of disk even though the user only has like 100MB of documents.
25+
Reason (2) presents the drawback that we don't know the precise disk space taken by a document, for technical reasons. Since the information we have is the total size taken by all documents (soft-deleted or not) and the number of documents, we approximate the size of a document to the average size of a document.
26+
This means that if a few outliers are updated/deleted, they can take up much more disk space than the fixed threshold.
3427

3528
## 4. Future Possibilities
3629

3730
- Work again on the way to get the size of the disk the `data.ms` is currently running on. This would improve the analytics as well.
38-
- Provide a cli parameter to select how much space can be used to store the soft deleted documents.
31+
- Provide a CLI parameter to select how much space can be used to store the soft deleted documents.
3932
- It could be expressed as a real size or in terms of percentage.
40-
- Provide a route to delete the soft deleted documents.
41-
- It could be useful if a user **know** he will have a lot of updates during the day but nothing around midnight, for example.
42-
- It would allow a user to clear the soft deleted when meilisearch is not under pressure to ensure all your updates stay fast during the day.
33+
- Provide a route to delete the soft-deleted documents.
34+
- It could be useful if a user **knows** they will have a lot of updates during the day but nothing around midnight, for example.
35+
- It would allow a user to clear the soft-deleted when Meilisearch is not under pressure to ensure all your updates stay fast during the day.

0 commit comments

Comments
 (0)