Open
Description
This is a general meta issue to capture the intent of trying to leverage index structures more often in aggregations. Today, we have some simple optimizations that will "short circuit" agg execution by consulting the BKD tree (min/max aggs for example), and recently some substantial work to rewrite date_histograms into ranges/filters.
In both cases, these optimizations can greatly accelerate the "hot path" by looking up data in the index, rather than iterating over each document and polling the DV. We think there are probably a number of such cases, where we can accelerate certain scenarios or arrangements of aggs by reusing data in the index
Related:
- Index date field data with lower precision #64662
- Speed up date_histogram without children #63643
- Merge the implementation of
filter
intofilters
so it can share in all the performance improvements onfilters
. - Merge "filter-by-filter" execution with parent aggregations if possible. This'd give huge speed up if
filters
is under an agg that can run in filter-by-filter mode. It'd be fairly helpful when two "filter-by-filter" compatible aggs are nested in one another. filters
aggregations onrange
queries without children could use the BKD index to count matches instead of enumerating all matchescardinality
aggregations onmatch_all
queries could build the HLL++ object from the terms dictionary instead of collecting all matchespercentiles
aggregations onmatch_all
queries could build the HDR histogram from the BKD tree: for leaf nodes where the min and max value would be on the same bucket, we wouldn't need to collect all individual values one by one.- Can date_histograms better take advantage of data locality? #90261
- Support dynamic pruning in the
composite
aggregation #88185