Skip to content
This repository was archived by the owner on Apr 4, 2023. It is now read-only.
This repository was archived by the owner on Apr 4, 2023. It is now read-only.

Optimise indexing of facet databases #589

@loiclec

Description

@loiclec

When creating the facet_id_string_docids and facet_id_f64_docids databases, we need to perform approximately n * log_b(n) roaring bitmap unions, where n is the number of different facet values for a certain field in all the documents and b is the "level group size". This is because we iterate over the level 0 of these databases (containing all the unique field values) for every additional level we want to create.

For example, in the benchmark Indexing geo_point, we have a lot of different documents with unique longitudes and latitudes, which are faceted fields. If there are 1,000,000 of them, then we need to create 8 levels, and thus perform 8,000,000 bitmap unions. This is expensive.

It is possible to use a recursive algorithm instead which computes level N from the contents of level N-1. Doing so would reduce the number of bitmap unions from n * log_b(n) to n + 1/(b-1) * n (n times for level 1, n/b times for level 2, n/b^2 for level 3, etc.).

Doing so is important because in datasets with a lot of different facet values, a very significant portion of the time is spent computing these bitmaps.

Metadata

Metadata

Assignees

No one assigned

    Labels

    indexingRelated to the documents/settings indexing algorithms.performanceRelated to the performance in term of search/indexation speed or RAM/CPU/Disk consumption

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions