Skip to content

10.2.0

Latest
Compare
Choose a tag to compare
@iverase iverase released this 10 Apr 10:17
· 430 commits to main since this release

Lucene 10.2 includes major search-time performance improvements for a wide variety of queries. This is most notably due to:

  • Improved storage format of doc IDs in BKD trees for faster decoding.
    More vectorization when processing PointRangeQuerys and non-scoring BooleanQuerys.
  • Encoding of dense blocks of postings lists as bit sets instead of FOR-delta. This change also saves a bit of storage.
  • Merging matches of dense conjunctive clauses using bitwise ANDs. This especially helps on postings blocks that are encoded as bit sets.
    Implementing the ACORN-1 algorithm for pre-filtered vector searches.
  • Searches that don't require scores and match many docs should generally see good speedups, depending on how expensive the Collector is. Compared with Lucene 10.1.0, Lucene's nightly benchmarks report the following speedups when counting the number of hits of a the following queries:
    * Disjunctions of term queries: 77% to 4x faster
    * Conjunctions of term queries: 38% to 5x faster
    * Filtered disjunctions of term queries: 2.5x to 4x faster
    * Filtered PointRangeQuery: 3.5x faster
  • And the following speedup when computing top-100 hits:
    * Pre-filtered vector search: 3.5x faster

Changes in Runtime Behavior

  • TieredMergePolicy's default floor segment size was increased from 2MB to 16MB. This is expected to result in slightly slower indexing and about 10 fewer segments per index for applications that flush frequently. This should in-turn help speed up queries that have a high per-segment overhead such as multi-term queries, point queries and vector search.

New Features

  • Added TopDocs#rrf to combine multiple TopDocs instances using reciprocal rank fusion.
  • Added SeededKnnVectorQuery, an optimization to KnnVectorQuery that allows selecting better entry points for vector search using a seed Query.

Improvements

  • RegexpQuery support for unicode case-insensitive characters and ranges.
    Optimizations
  • Java 24 vector API support
  • Efficiency improvements to Automaton and RegExp
  • Faster merging of HNSW graphs which translated in a 25% indexing speedup in Lucene's nightly benchmarks.
  • Conjunctive queries can now skip applying clauses when they have long runs of matching docs, a case which is not uncommon when an index sort is configured.
  • Reduce heap usage during BKD tree merges.