Optimise indexing of facet databases

When creating the `facet_id_string_docids` and `facet_id_f64_docids` databases, we need to perform approximately `n * log_b(n)` roaring bitmap unions, where `n` is the number of different facet values for a certain field in all the documents and `b` is the "level group size". This is because we iterate over the level 0 of these databases (containing all the unique field values) for every additional level we want to create.

For example, in the benchmark `Indexing geo_point`, we have a lot of different documents with unique longitudes and latitudes, which are faceted fields. If there are 1,000,000 of them, then we need to create 8 levels, and thus perform 8,000,000 bitmap unions. This is expensive. 

It is possible to use a recursive algorithm instead which computes level N from the contents of level N-1. Doing so would reduce the number of bitmap unions from `n * log_b(n)` to `n + 1/(b-1) * n` (`n` times for level 1, `n/b` times for level 2, `n/b^2` for level 3, etc.).

Doing so is important because in datasets with a lot of different facet values, a very significant portion of the time is spent computing these bitmaps.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optimise indexing of facet databases #589

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Optimise indexing of facet databases #589

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions