Skip to content

[META] Skip List Based Optimization for Aggregations in OpenSearch #19384

@jainankitk

Description

@jainankitk

Motivation

Efficient aggregations are critical for OpenSearch use cases involving analytics, dashboards, and exploratory queries. Recent work such as Filter Rewrite and Multi Range Traversal have shown orders-of-magnitude improvements for specific cases like date_histogram. However, these optimizations are limited to scenarios where the aggregation field aligns closely with query filters and document structures.

This RFC proposes a new optimization leveraging Lucene’s skip index on doc values (introduced in Lucene 10) to accelerate aggregations even when:

  • Filters apply on fields other than the aggregation field
  • Sub-aggregations are involved
  • Segments contain deleted documents

Our approach achieves up to 2x improvements across broader workloads than Multi Range Traversal, by enabling efficient doc ID skipping and population counting using CPU-level instructions.

Skip Index in Lucene

Image
  • Built on top of doc values, conceptually similar to a skip list.
  • Maintains min/max values per interval at multiple levels (e.g., skipping every 5, 25, 125 documents).
  • Enables efficient skipping of doc IDs when combined with sorted doc values.

Problem Statement

Given a query that:

  • Filters on one field (trip_distance: [0, 50])
  • Aggregates on another (dropoff_datetime in a monthly date_histogram)

Existing Multi Range Traversal cannot optimize this case because the filter field (trip_distance) has no correlation with the aggregation field (dropoff_datetime).

We need a mechanism to bridge filter doc ID sets with sorted aggregation fields efficiently.

Proposed Solution

Core Idea

Overlay the doc ID set iterator (from the filter field) with the skip index of the aggregation field, enabling block-level skipping and efficient counting.

Execution Flow

  1. Query execution produces a doc ID set iterator for the filter (e.g., trip_distance: [0,50]).
  2. Retrieve the skip index for the aggregation field (dropoff_datetime).
  3. Align the doc ID set with skip intervals:
  • If all doc IDs in an interval fall into the same aggregation bucket (e.g., a single month in date_histogram), skip iteration over individual doc IDs.
  • Instead, count them in bulk using bitset operations.
  1. Use CPU popcount instruction to efficiently compute set bits in memory-backed bitsets representing doc IDs.

Example

  • Doc IDs: 35–76
  • Filter: trip_distance [0,50]
  • Aggregation: dropoff_datetime monthly buckets
  • Observation: all dropoff_datetime values for docs 35–76 fall into the same month bucket
  • Optimization: count all docs 35–76 in one step with popcount instead of iterating per doc.

Evaluation

Prototype Results:

  • Achieved up to 2x improvement across multi-filter + aggregation queries.
  • Benefits are more broadly applicable than Multi Range Traversal (not limited to histogram-only workloads).

Trade-offs:

  • Gains smaller than the extreme improvements of Multi Range Traversal (100x in best cases).
  • Effectiveness depends on data distribution and sort order of indexed fields.

Applicability

  • Queries with filters on one field and aggregations on another.
  • Sub-aggregation scenarios.
  • Segments with deleted documents (since bitset iteration + popcount handles deletions).

Development Plan

  • Implement for date histogram histogram aggregation without any sub aggregations - Adding logic for histogram aggregation using skiplist #19130
  • Extend the date histogram implementation to work with sub aggregations as well
  • Extend the aggregation logic to work even when requested aggregation is on field2, while index is sorted on [field1, field2], but there is query filter on field1
  • Abstract the date histogram implementation to make it work with other type of bucket aggregations (like numeric/terms etc.) as well
  • Add bulk collection logic to take advantage of JVM inlining and vectorized execution for scalar computations as part of sub aggregations or otherwise - [META] Use Lucene bulk collection API to speed up aggregation #19324

Related component

Search:Aggregations, Search:Performance

Metadata

Metadata

Labels

MetaMeta issue, not directly linked to a PRlucene

Projects

Status

Todo

Status

New

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions