-
Notifications
You must be signed in to change notification settings - Fork 25.3k
Add sort optimization with after from Lucene #64292
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add sort optimization with after from Lucene #64292
Conversation
Pinging @elastic/es-search (:Search/Search) |
Lucene 8.7 introduces numeric sort optimization directly in comparators. This means we don't need to have it in ES. This removes the sort optimization code in ES, and the only thing that is needed is setCanUsePoints on sortField. This also will introduce sort optimization with search_after, as Lucene directly suports sort optimization with search_after.
f796c4e
to
8a4c342
Compare
Rally benchmarking results for geonames (number of docs: 11M, many segments): baseline: master branch; contender: this PR Overall very good results: sort:
sort with after:
|
http_logs results with bulk_indexing_clients:1 Multiple segments:
Single segment:
|
Sort reader leaves in ContextIndexSearcher for numeric sort optimizaiton.
This reverts commit ca72e0c.
Here are the benchmarks for sorted-reader: baseline: master branch; contender: sorted-reader branch geonames:Overall very good results: sort:
sort with after:
http_logs:Multiple segments:
Single segment:
@jimczi It looks like your approach with a sorted MultiReader worked well, and we have speedups both for desc and asc sort. We can go ahead with this sorted_reader then. |
We took another approach for this, hence closing. |
@jimczi Do you have any other issue/PR to follow improvements related to this topic ? |
@jimczi I am reviving this PR with merging the latest master here, can you please continue the review when you have time. Here are the latest benchmarks for Here are the results of the 99th percentile service time. Surprisingly, this time I also see 17% regression on desc_sort_with_after_timestamp. I will investigate what could be a reason for it. +| Task | Baseline | Contender | Diff | Unit |
+| desc_sort_timestamp | 79.6757 | 66.9085 | -12.7672 | ms |
+| asc_sort_timestamp | 4.51976 | 3.58597 | -0.93378 | ms |
-| desc_sort_with_after_timestamp | 748.949 | 878.057 | 129.108 | ms |
+| asc_sort_with_after_timestamp | 754.128 | 459.698 | -294.43 | ms |
+| desc-sort-timestamp-after-force-merge-1-seg | 1485.43 | 985.803 | -499.632 | ms |
+| asc-sort-timestamp-after-force-merge-1-seg | 4.17577 | 3.23124 | -0.94453 | ms |
+| desc-sort-with-after-timestamp-after-force-merge-1-seg | 866.316 | 519.599 | -346.717 | ms |
+| asc-sort-with-after-timestamp-after-force-merge-1-seg | 790.108 | 604.906 | -185.202 | ms | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM !
Lucene 8.7 introduces numeric sort optimization directly in comparators. This means we don't need to have it in ES. This removes the sort optimization code in ES, and the only thing that is needed is setCanUsePoints on sortField. This also will introduce sort optimization with search_after, as Lucene directly supports sort optimization with search_after. As previously, we enable sort optimization only when there is Long sort on Long or Date fields. There could be a regression on desc sort, for example on @timestamp desc field. In this case, we suggest to convert these indices to data stream indices, as datatastream indices have their desc sort on @timestamp field optimized.
Lucene 8.7 introduces numeric sort optimization directly in comparators. This means we don't need to have it in ES. This removes the sort optimization code in ES, and the only thing that is needed is setCanUsePoints on sortField. This also will introduce sort optimization with search_after, as Lucene directly supports sort optimization with search_after. As previously, we enable sort optimization only when there is Long sort on Long or Date fields. There could be a regression on desc sort, for example on @timestamp desc field. In this case, we suggest to convert these indices to data stream indices, as datatastream indices have their desc sort on @timestamp field optimized Backport for #64292
Lucene 8.7 introduces numeric sort optimization directly in comparators.
This means we don't need to have it in ES.
This removes the sort optimization code in ES, and
the only thing that is needed is setCanUsePoints on sortField.
This also will introduce sort optimization with search_after,
as Lucene directly supports sort optimization with search_after.
As previously, we enable sort optimization only when there is
Long sort on Long or Date fields.
There could be a regression on desc sort, for example on
@timestamp desc field. In this case, we suggest to
convert these indices to data stream indices, as
datatastream indices have their desc sort on
@timestamp field optimized