Skip to content

[Feature Request] Use lucene sparse index in opensearch #17710

@animodak7

Description

@animodak7

Is your feature request related to a problem? Please describe

Lucene10 has introduced sparse index which uses skip list on top of doc values. We need support of this feature in OpenSearch.

Describe the solution you'd like

Analysis

Analysis is done on Lucene by using 4 different queries.

Selectability Cardinality IndexQuery IndexOrDocValQuery IndexSortSortedNumericDocValuesRangeQuery(SkipList=Disabled) IndexSortSortedNumericDocValuesRangeQuery(SkipList=Enabled) SortedNumericDocValuesField.
newSlowExactQuery(SkipList=Disabled)
SortedNumericDocValuesField.
newSlowExactQuery(SkipList=Enabled)
newSlowExactQuery - Disabled/Enabled

x N
PointsQuery - SkipList SortedDVQuery-SkipList
50 100 48.33333 85 60.33333 43 141 58.3333 2.41714 26.6667 2.00003
50 10000 56 56 70 52.33333 181 58 3.12069 -2 12
50 10000000 99.3333 75.3333 49.33333 40.66667 197.66667 37.3333 5.29465 38 12.00003
20 100 23.33333 27 17.66667 18.33333 176.66667 39.66667 4.45378 -12.66667 -22
20 10000 22 22.33333 17.3333 16 182.33333 25.66667 7.1039 -3.33334 -8.33337
20 10000000 22.33333 23 17 17.3333 188.33333 30 6.27778 -7 -13
10 100 14.6667 14.3333 9.66667 10.66667 130 16.333 7.95935 -1.9997 -6.66633
10 10000 14 15.66667 9 9.3333 170.6667 9 18.96297 6.66667 0
10 10000000 14.33333 16.66667 10 9 177.66667 8.66667 20.49999 8 1.33333
2 100 9.33333 14 6.66667 4.33333 134 10.33333 12.96775 3.66667 -3.66666
2 10000 7 6.66667 3.33333 3.33333 168.33333 3.66667 45.90905 3 -0.33334
2 10000000 8.33333 6.33333 2.66667 4.33333 197.66667 3.33333 59.30006 3 -0.66666

Observations

  1. SortedDVQuery vs SkipList - SortedDVQuery performed better in almost all cases, performance difference is higher with large query selectability.
  2. PointsQuery vs SkipList - SkipList either performing better or close to PointsQuery in most cases.
  3. SlowDVQuery vs skipList - Simply refer to column SortedNumericDocValuesField SkipList enabled and disabled

We have performed benchmarking on Primary sort since we think that this feature will be more beneficial in this case. For Secondary sort query is slower than BKD acoording to this but still much faster than the brute approach.

Related component

Indexing

Describe alternatives you've considered

No response

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    IndexingIndexing, Bulk Indexing and anything related to indexingenhancementEnhancement or improvement to existing feature or requestlucene

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions