Improve speed of incremental indexing

Currently there are a few bottlenecks preventing milli to quickly add a document to an existing index. The goal of this issue is to track these bottlenecks.

- [ ]  Facets indexing is not incremental at all, as explained in https://github.com/meilisearch/milli/pull/590 .
- [ ] `word_prefix_docids` and `word_prefix_position_docids` iterate over their whole database to check for elements that need to be deleted, even in the case where we know in advance that nothing has to be deleted.
- [ ] The FST, in general, is not a data structure that can be easily updated. Therefore, `WordsPrefixesFst::execute` as well as the handler for `TypedChunk::WordDocIds` are not incremental either.
- [ ] The RTree used for geosearch can be updated incrementally, but it does need to be deserialised in its entirety first. It is hard to see how else it could work, but I'll mention it here anyway.

As an example, on the `geo_point` dataset, adding a single document takes 3.07s.
- `Facets::execute` takes about 1.55s (**after** https://github.com/meilisearch/milli/pull/590 is implemented, much much longer otherwise)
- `WordPrefixPositionDocids::execute()` and `WordPrefixDocids::execute()` together take 0.2s
- `WordsPrefixesFst::execute()` takes 0.4s
- Handling `TypedChunk::WordDocids` takes 0.73s
- Handling `TypedChunk::GeoPoints` takes 0.17s

All together, these non-incremental parts take about 99.3% of the incremental indexing time.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve speed of incremental indexing #605

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Improve speed of incremental indexing #605

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions