-
Couldn't load subscription status.
- Fork 2.3k
Description
Please describe the end goal of this project
As follow up to skiplist meta #19384, #17967, this issue captures the introduction of skiplist in OS 3.2.
There are few points to consider:
-
Should skiplist be enabled by default?
This depends on index size impact. Enabling skiplist on just timestamp field shows a 0.5% increase in bytes, which is very minimal. [Feature Request] Use lucene sparse index in opensearch #17710 (comment). Another experiment on enabled for all numeric fields in big5 bechmark shows [63%, 22 to 36gb] Enable Skiplist Optimization forSortedNumericDocValuesField.newSlowExactQueryin OpenSearch #18751.
Since skiplist of most useful on (mostly) sorted field, and to provide the targeted benefit with least indexing impact, it is enabled by default on date field with name@timestamp, or primary sort field. -
What is the performance impact?
Some numbers are in [Feature Request] Use lucene sparse index in opensearch #17710 with synthenic data. big5 and http_logs benchmarks are in Enable Skiplist Optimization forSortedNumericDocValuesField.newSlowExactQueryin OpenSearch #18751 [Summary TBD]
Based on above.
Plan for 3.2 is to:
- ✅ Run benchmarks and find speed up in existing queries: Enable Skiplist Optimization for
SortedNumericDocValuesField.newSlowExactQueryin OpenSearch #18751 - ✅ Add skiplist mapping option with default false: [SparseIndex] Modify FieldMappers to enable SkipList #17965
Plan for 3.3
- ✅ Add skip_list param for date, scaled float and token count fields Add skip_list param for date, scaled float and token count fields #19142
- ✅ Adding logic for histogram aggregation using skiplist Adding logic for histogram aggregation using skiplist #19130
- ✅ DateHistogram support sub aggregation: Add sub aggregation support for histogram aggregation using skiplist #19438
- ✅ Enable skip_list for @timestamp field or index sort field by default Enable skip_list for @timestamp field or index sort field by default #19480
- ✅ Fix @timestamp upgrade issue by adding a version check on skip_list param (3.3) Fix @timestamp upgrade issue by adding a version check on skip_list param (3.3) #19671
- ✅ Add nyc_taxis date_histogram_calendar_interval_with_filter operation Add nyc_taxis date_histogram_calendar_interval_with_filter operation opensearch-benchmark-workloads#697
Plan for 3.4
- Combining filter rewrite and skip list to optimize sub aggregation:
](Combining filter rewrite and skip list to optimize sub aggregation #19573) - Add skip_list logic to auto date histogram, confirm with big5's
range-auto-date-history-with-metrcis, range-auto-date-history - Use skiplist for min and max aggregation
Follow up
- Add bwc tests to make sure there's no upgrade failure: Missing bwc test for indexing data for index created with older version #19682
Metadata
Metadata
Assignees
Labels
Type
Projects
Status
Status
Status