This repository was archived by the owner on Apr 4, 2023. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 83
This repository was archived by the owner on Apr 4, 2023. It is now read-only.
Improve speed of incremental indexing #605
Copy link
Copy link
Closed
Labels
indexingRelated to the documents/settings indexing algorithms.Related to the documents/settings indexing algorithms.performanceRelated to the performance in term of search/indexation speed or RAM/CPU/Disk consumptionRelated to the performance in term of search/indexation speed or RAM/CPU/Disk consumption
Description
Currently there are a few bottlenecks preventing milli to quickly add a document to an existing index. The goal of this issue is to track these bottlenecks.
- Facets indexing is not incremental at all, as explained in Optimise facets indexing #590 .
-
word_prefix_docidsandword_prefix_position_docidsiterate over their whole database to check for elements that need to be deleted, even in the case where we know in advance that nothing has to be deleted. - The FST, in general, is not a data structure that can be easily updated. Therefore,
WordsPrefixesFst::executeas well as the handler forTypedChunk::WordDocIdsare not incremental either. - The RTree used for geosearch can be updated incrementally, but it does need to be deserialised in its entirety first. It is hard to see how else it could work, but I'll mention it here anyway.
As an example, on the geo_point dataset, adding a single document takes 3.07s.
Facets::executetakes about 1.55s (after Optimise facets indexing #590 is implemented, much much longer otherwise)WordPrefixPositionDocids::execute()andWordPrefixDocids::execute()together take 0.2sWordsPrefixesFst::execute()takes 0.4s- Handling
TypedChunk::WordDocidstakes 0.73s - Handling
TypedChunk::GeoPointstakes 0.17s
All together, these non-incremental parts take about 99.3% of the incremental indexing time.
Kerollmops, mmachatschek and irevoire
Metadata
Metadata
Assignees
Labels
indexingRelated to the documents/settings indexing algorithms.Related to the documents/settings indexing algorithms.performanceRelated to the performance in term of search/indexation speed or RAM/CPU/Disk consumptionRelated to the performance in term of search/indexation speed or RAM/CPU/Disk consumption