Skip to content
This repository was archived by the owner on Apr 4, 2023. It is now read-only.

Commit 758b4ac

Browse files
bors[bot]loiclec
andauthored
Merge #776
776: Reduce incremental indexing time of `words_prefix_position_docids` DB r=curquiza a=loiclec Fixes partially #605 The `words_prefix_position_docids` can easily contain millions of entries. Thus, iterating over it can be very expensive. But we do so needlessly for every document addition tasks. It can sometimes cause indexing performance issues when : - a user sends many `documentAdditionOrUpdate` tasks that cannot be all batched together (for example if they are interspersed with `documentDeletion` tasks) - the documents contain long, diverse text fields, thus increasing the number of entries in `words_prefix_position_docids` - the index has accumulated many soft-deleted documents, further increasing the size of `words_prefix_position_docids` - the machine running Meilisearch does not have great IO performance (e.g. slow SSD, or quota-limited by the cloud provider) Note, before approving the PR: the only changed file should be `milli/src/update/words_prefix_position_docids.rs`. Co-authored-by: Loïc Lecrenier <loic.lecrenier@me.com>
2 parents a4e8158 + a2690ea commit 758b4ac

File tree

1 file changed

+11
-7
lines changed

1 file changed

+11
-7
lines changed

milli/src/update/words_prefix_position_docids.rs

Lines changed: 11 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -140,16 +140,20 @@ impl<'t, 'u, 'i> WordPrefixPositionDocids<'t, 'u, 'i> {
140140

141141
// We remove all the entries that are no more required in this word prefix position
142142
// docids database.
143-
let mut iter =
144-
self.index.word_prefix_position_docids.iter_mut(self.wtxn)?.lazily_decode_data();
145-
while let Some(((prefix, _), _)) = iter.next().transpose()? {
146-
if del_prefix_fst_words.contains(prefix.as_bytes()) {
147-
unsafe { iter.del_current()? };
143+
// We also avoid iterating over the whole `word_prefix_position_docids` database if we know in
144+
// advance that the `if del_prefix_fst_words.contains(prefix.as_bytes()) {` condition below
145+
// will always be false (i.e. if `del_prefix_fst_words` is empty).
146+
if !del_prefix_fst_words.is_empty() {
147+
let mut iter =
148+
self.index.word_prefix_position_docids.iter_mut(self.wtxn)?.lazily_decode_data();
149+
while let Some(((prefix, _), _)) = iter.next().transpose()? {
150+
if del_prefix_fst_words.contains(prefix.as_bytes()) {
151+
unsafe { iter.del_current()? };
152+
}
148153
}
154+
drop(iter);
149155
}
150156

151-
drop(iter);
152-
153157
// We finally write all the word prefix position docids into the LMDB database.
154158
sorter_into_lmdb_database(
155159
self.wtxn,

0 commit comments

Comments
 (0)