[META] Use Lucene bulk collection API to speed up aggregation

## Overview

Some Lucene scorers can now pass a DocIdStream to collector for bulk collection, like `DenseConjunctionBulkScorer`.
We want to research on what changes in aggregation is needed to adopt that, and how to speed up with that.

Here are some ideas:
- Experiment with new Lucene API released in 10.3 `LeafCollector#collectRange`
  - We can enable the DocValuesSkipper and use that pre-aggregated min, max from the skipper to speed up Min,MaxAggregation.
  - @asimmahmood1 will try this out. Related issue: #19130 
- Experiment with new Lucene API probably in 10.4, `NumericDocValues#longValues`, `DocIdStream#intoArray`
  - Use these APIs in some OpenSearch aggregations and benchmark accordinly
  - Experiment with pushing down the `NumericDocValues#longValues` to Codec level in `Lucene90DocValuesProducer`. Theoretically this is suitable for dense case (all documents have values) and underlying storage format can be read sequentially in bulk efficiently.
- Be aware of the cost of virtual call and try to reduce that by the technique of bulk processing

## Works In Progress

- Handpicked related Lucene changes to a specific branch: https://github.com/bowenlan-amzn/lucene/commits/10.2.2-bulkcollect/
- Consume the lucene branch, and use the new API in OpenSearch MaxAggregator: https://github.com/bowenlan-amzn/OpenSearch/commits/bulkcollection/
- Benchmark on `nyc_taxis` workload. The default way takes 33ms, while the bulk collection way takes 80ms. So need deep dive to understand this results. We are expecting ~20% speedup.

## Related Lucene Changes

- Enable collectors to take advantage of pre-aggregated data. [#14401](https://github.com/apache/lucene/pull/14401)
- Add bulk-retrieval API to NumericDocValues. [#15149](https://github.com/apache/lucene/pull/15149)
- Help collectors take advantage of bulk-retrieval of doc values. [#15173](https://github.com/apache/lucene/pull/15173)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[META] Use Lucene bulk collection API to speed up aggregation #19324

Overview

Works In Progress

Related Lucene Changes

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[META] Use Lucene bulk collection API to speed up aggregation #19324

Description

Overview

Works In Progress

Related Lucene Changes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions