Skip to content
This repository was archived by the owner on Apr 4, 2023. It is now read-only.
This repository was archived by the owner on Apr 4, 2023. It is now read-only.

Improve speed of incremental indexing #605

@loiclec

Description

@loiclec

Currently there are a few bottlenecks preventing milli to quickly add a document to an existing index. The goal of this issue is to track these bottlenecks.

  • Facets indexing is not incremental at all, as explained in Optimise facets indexing #590 .
  • word_prefix_docids and word_prefix_position_docids iterate over their whole database to check for elements that need to be deleted, even in the case where we know in advance that nothing has to be deleted.
  • The FST, in general, is not a data structure that can be easily updated. Therefore, WordsPrefixesFst::execute as well as the handler for TypedChunk::WordDocIds are not incremental either.
  • The RTree used for geosearch can be updated incrementally, but it does need to be deserialised in its entirety first. It is hard to see how else it could work, but I'll mention it here anyway.

As an example, on the geo_point dataset, adding a single document takes 3.07s.

  • Facets::execute takes about 1.55s (after Optimise facets indexing #590 is implemented, much much longer otherwise)
  • WordPrefixPositionDocids::execute() and WordPrefixDocids::execute() together take 0.2s
  • WordsPrefixesFst::execute() takes 0.4s
  • Handling TypedChunk::WordDocids takes 0.73s
  • Handling TypedChunk::GeoPoints takes 0.17s

All together, these non-incremental parts take about 99.3% of the incremental indexing time.

Metadata

Metadata

Assignees

No one assigned

    Labels

    indexingRelated to the documents/settings indexing algorithms.performanceRelated to the performance in term of search/indexation speed or RAM/CPU/Disk consumption

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions