Skip to content

[META] Skiplist plan for 3.x #18882

@asimmahmood1

Description

@asimmahmood1

Please describe the end goal of this project

As follow up to skiplist meta #19384, #17967, this issue captures the introduction of skiplist in OS 3.2.

There are few points to consider:

  1. Should skiplist be enabled by default?
    This depends on index size impact. Enabling skiplist on just timestamp field shows a 0.5% increase in bytes, which is very minimal. [Feature Request] Use lucene sparse index in opensearch #17710 (comment). Another experiment on enabled for all numeric fields in big5 bechmark shows [63%, 22 to 36gb] Enable Skiplist Optimization for SortedNumericDocValuesField.newSlowExactQuery in OpenSearch #18751.
    Since skiplist of most useful on (mostly) sorted field, and to provide the targeted benefit with least indexing impact, it is enabled by default on date field with name @timestamp, or primary sort field.

  2. What is the performance impact?
    Some numbers are in [Feature Request] Use lucene sparse index in opensearch #17710 with synthenic data. big5 and http_logs benchmarks are in Enable Skiplist Optimization for SortedNumericDocValuesField.newSlowExactQuery in OpenSearch #18751 [Summary TBD]

Based on above.

Plan for 3.2 is to:

  1. ✅ Run benchmarks and find speed up in existing queries: Enable Skiplist Optimization for SortedNumericDocValuesField.newSlowExactQuery in OpenSearch #18751
  2. ✅ Add skiplist mapping option with default false: [SparseIndex] Modify FieldMappers to enable SkipList #17965

Plan for 3.3

  1. ✅ Add skip_list param for date, scaled float and token count fields Add skip_list param for date, scaled float and token count fields #19142
  2. ✅ Adding logic for histogram aggregation using skiplist Adding logic for histogram aggregation using skiplist #19130
  3. ✅ DateHistogram support sub aggregation: Add sub aggregation support for histogram aggregation using skiplist #19438
  4. ✅ Enable skip_list for @timestamp field or index sort field by default Enable skip_list for @timestamp field or index sort field by default #19480
  5. ✅ Fix @timestamp upgrade issue by adding a version check on skip_list param (3.3) Fix @timestamp upgrade issue by adding a version check on skip_list param (3.3) #19671
  6. ✅ Add nyc_taxis date_histogram_calendar_interval_with_filter operation Add nyc_taxis date_histogram_calendar_interval_with_filter operation opensearch-benchmark-workloads#697

Plan for 3.4

  1. Combining filter rewrite and skip list to optimize sub aggregation:
    ](Combining filter rewrite and skip list to optimize sub aggregation #19573)
  2. Add skip_list logic to auto date histogram, confirm with big5's range-auto-date-history-with-metrcis, range-auto-date-history
  3. Use skiplist for min and max aggregation

Follow up

Metadata

Metadata

Assignees

Projects

Status

👀 In review

Status

Todo

Status

New

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions