-
Notifications
You must be signed in to change notification settings - Fork 83
Optimise indexing of facet databases #589
Description
When creating the facet_id_string_docids and facet_id_f64_docids databases, we need to perform approximately n * log_b(n) roaring bitmap unions, where n is the number of different facet values for a certain field in all the documents and b is the "level group size". This is because we iterate over the level 0 of these databases (containing all the unique field values) for every additional level we want to create.
For example, in the benchmark Indexing geo_point, we have a lot of different documents with unique longitudes and latitudes, which are faceted fields. If there are 1,000,000 of them, then we need to create 8 levels, and thus perform 8,000,000 bitmap unions. This is expensive.
It is possible to use a recursive algorithm instead which computes level N from the contents of level N-1. Doing so would reduce the number of bitmap unions from n * log_b(n) to n + 1/(b-1) * n (n times for level 1, n/b times for level 2, n/b^2 for level 3, etc.).
Doing so is important because in datasets with a lot of different facet values, a very significant portion of the time is spent computing these bitmaps.