-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Open
Labels
IndexingIndexing, Bulk Indexing and anything related to indexingIndexing, Bulk Indexing and anything related to indexingenhancementEnhancement or improvement to existing feature or requestEnhancement or improvement to existing feature or requestlucene
Description
Is your feature request related to a problem? Please describe
Lucene10 has introduced sparse index which uses skip list on top of doc values. We need support of this feature in OpenSearch.
Describe the solution you'd like
Analysis
Analysis is done on Lucene by using 4 different queries.
| Selectability | Cardinality | IndexQuery | IndexOrDocValQuery | IndexSortSortedNumericDocValuesRangeQuery(SkipList=Disabled) | IndexSortSortedNumericDocValuesRangeQuery(SkipList=Enabled) | SortedNumericDocValuesField. newSlowExactQuery(SkipList=Disabled) |
SortedNumericDocValuesField. newSlowExactQuery(SkipList=Enabled) |
newSlowExactQuery - Disabled/Enabled x N |
PointsQuery - SkipList | SortedDVQuery-SkipList |
|---|---|---|---|---|---|---|---|---|---|---|
| 50 | 100 | 48.33333 | 85 | 60.33333 | 43 | 141 | 58.3333 | 2.41714 | 26.6667 | 2.00003 |
| 50 | 10000 | 56 | 56 | 70 | 52.33333 | 181 | 58 | 3.12069 | -2 | 12 |
| 50 | 10000000 | 99.3333 | 75.3333 | 49.33333 | 40.66667 | 197.66667 | 37.3333 | 5.29465 | 38 | 12.00003 |
| 20 | 100 | 23.33333 | 27 | 17.66667 | 18.33333 | 176.66667 | 39.66667 | 4.45378 | -12.66667 | -22 |
| 20 | 10000 | 22 | 22.33333 | 17.3333 | 16 | 182.33333 | 25.66667 | 7.1039 | -3.33334 | -8.33337 |
| 20 | 10000000 | 22.33333 | 23 | 17 | 17.3333 | 188.33333 | 30 | 6.27778 | -7 | -13 |
| 10 | 100 | 14.6667 | 14.3333 | 9.66667 | 10.66667 | 130 | 16.333 | 7.95935 | -1.9997 | -6.66633 |
| 10 | 10000 | 14 | 15.66667 | 9 | 9.3333 | 170.6667 | 9 | 18.96297 | 6.66667 | 0 |
| 10 | 10000000 | 14.33333 | 16.66667 | 10 | 9 | 177.66667 | 8.66667 | 20.49999 | 8 | 1.33333 |
| 2 | 100 | 9.33333 | 14 | 6.66667 | 4.33333 | 134 | 10.33333 | 12.96775 | 3.66667 | -3.66666 |
| 2 | 10000 | 7 | 6.66667 | 3.33333 | 3.33333 | 168.33333 | 3.66667 | 45.90905 | 3 | -0.33334 |
| 2 | 10000000 | 8.33333 | 6.33333 | 2.66667 | 4.33333 | 197.66667 | 3.33333 | 59.30006 | 3 | -0.66666 |
Observations
- SortedDVQuery vs SkipList - SortedDVQuery performed better in almost all cases, performance difference is higher with large query selectability.
- PointsQuery vs SkipList - SkipList either performing better or close to PointsQuery in most cases.
- SlowDVQuery vs skipList - Simply refer to column SortedNumericDocValuesField SkipList enabled and disabled
We have performed benchmarking on Primary sort since we think that this feature will be more beneficial in this case. For Secondary sort query is slower than BKD acoording to this but still much faster than the brute approach.
Related component
Indexing
Describe alternatives you've considered
No response
Additional context
No response
kkewwei, bharath-techie and owaiskazi19
Metadata
Metadata
Assignees
Labels
IndexingIndexing, Bulk Indexing and anything related to indexingIndexing, Bulk Indexing and anything related to indexingenhancementEnhancement or improvement to existing feature or requestEnhancement or improvement to existing feature or requestlucene