-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add CachingLeafSlicesSupplier to compute the LeafSlices for concurrent segment search #12374
Conversation
@javanna Can you help to look into this PR ? Would be great if we can merge it in before release branch cut off. |
lucene/test-framework/src/java/org/apache/lucene/tests/util/LuceneTestCase.java
Outdated
Show resolved
Hide resolved
hi @sohami thanks for opening this PR! I have been considering a different approach, in the attempt of not adding additional constructors to IndexSearcher, as it already has quite some. I am thinking that pre-computing the slices in the constructor may not be so important. How about removing that and computing the slices each time they are retrieved? There's only three methods where that is used between searchAfter and search. If needed, subclasses can still pre-compute and cache the slices if that's considered important. This way, subclasses are free to override how slices are computed, and the change is a little less invasive. |
@javanna Thanks for the suggestion. I would think the base implementation will want to avoid computing it on every usage specially for the immutable indices where IndexSearcher is suggested to be reused. If it is not cached then for each search it will compute the slices which seems redundant and also deviating from current baseline. If we want to cache the computed slices in base class but remove it from getting initialized from constructor, then essentially we are making I think if the concern is to avoid adding yet another constructor then other way could be to make the |
@javanna I am also thinking in regards to too many constructor, can we see if deprecating the ones with IndexReader as inputs and keeping the ones with IndexReaderContext over the time will make sense. For the new one as well we can avoid adding one with IndexReader if we go down that path vs non final variable. |
I'm not entirely happy with this new constructor either, because now there are two ways to customize slicing: either with this new constructor or by overriding the For reference, this 20-years-old comment on If there is reluctance to computing slices on every call to |
+1 to not having two ways to customize slicing.
@jpountz I agree. I think the basic problem here is that the protected method is called from the constructor of the parent class, which really limits the flexibility and can lead to unexpected behavior. The simplest approach is to just compute the slices on every call to |
Agreed to this however the
Thanks for sharing the context @jpountz
With Also wondering what is the disadvantage in making |
No, Lucene doesn't have such a utility. +1 to hand rolling the memoization logic. I don't think it needs to be fancy in terms of concurrency, IndexSearcher is expected to spend much more time scoring hits than computing slices, so making it synchronized should be good enough?
I am happy either way. Caching feels like it has less potential for being perceived as a regression by some users. One approach could be to do the caching approach now which is a bit safer, and then move to the approach that you're suggesting on the
I don't like this approach because it exposes internals of IndexSearcher than I'd like users to not touch or have to know about. For instance if we made it protected and non-final, and later made a change to compute slices lazily instead of eagerly in the constructor, this could be perceived as a breaking change as there would be cases when leafSlices would now be |
public LeafSlice[] get() { | ||
if (executor == null) { | ||
return null; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would make more sense to have to have the executor == null
check in IndexSearcher#getSlices
than here so that this class focuses solely on caching logic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did it this way because IndexSearcher
can access leafSlicesSupplier
without using getSlices
. So everywhere it will need to perform explicit check for executor being non-null
too for such usages. With current mechanism non null check for executor will not be needed for any future usages as well. This is also keeping the current behavior to return null
slices when executor
is null
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As an alternative, could you move this check into the provider
lambda? It still feels wrong to me that CachingLeafSupplier
has to check the executor
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jpountz I have moved it to getSlices
method and using that everywhere as you suggested earlier. If executor
is null
then initializing the supplier as null
in that case. Please let me know if this looks good.
@jpountz Gentle reminder for the review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Can you add a CHANGES entry under 9.8?
…LeafSlices used with concurrent segment search. It uses the protected method `slices` by default to compute the slices which can be overriden by the sub classes of IndexSearcher
Added. I have also rebased the changes. Thanks all for your time and guidance. |
…LeafSlices used with concurrent segment (apache#12374) search. It uses the protected method `slices` by default to compute the slices which can be overriden by the sub classes of IndexSearcher
I have nominated and maintainers have agreed to invite Sorabh Hamirwasia(@sohami) to be a co-maintainer. Sorabh has kindly accepted. Sorabh has led the design and implementation of multiple features like query cancellation, concurrent segment search for aggregations and snapshot inter-operability w/remote storage in OpenSearch. Some significant issues and PRs authored by Sorabh for this effort are as follows: Feature proposals Concurrent segment search for aggregations : #6798 Searchable Remote Index : #2900 Implementations Concurrent segment search for aggregations: #7514 Lucene changes to leaf slices for concurrent search: apache/lucene#12374 Moving concurrent search to core : #7203 Query cancellation support : #986 In total, Sorabh has authored 14 PRs going back to Aug 2021. He also frequently reviews contributions from others and has reviewed nearly 20 PRs in the same time frame. I believe Sorabh will be a valuable addition as a maintainer of OpenSearch and will continue to contribute to the success of the project going forward. Signed-off-by: Bukhtawar Khan <bukhtawa@amazon.com>
I have nominated and maintainers have agreed to invite Sorabh Hamirwasia(@sohami) to be a co-maintainer. Sorabh has kindly accepted. Sorabh has led the design and implementation of multiple features like query cancellation, concurrent segment search for aggregations and snapshot inter-operability w/remote storage in OpenSearch. Some significant issues and PRs authored by Sorabh for this effort are as follows: Feature proposals Concurrent segment search for aggregations : #6798 Searchable Remote Index : #2900 Implementations Concurrent segment search for aggregations: #7514 Lucene changes to leaf slices for concurrent search: apache/lucene#12374 Moving concurrent search to core : #7203 Query cancellation support : #986 In total, Sorabh has authored 14 PRs going back to Aug 2021. He also frequently reviews contributions from others and has reviewed nearly 20 PRs in the same time frame. I believe Sorabh will be a valuable addition as a maintainer of OpenSearch and will continue to contribute to the success of the project going forward. Signed-off-by: Bukhtawar Khan <bukhtawa@amazon.com> (cherry picked from commit 7769682) Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
I have nominated and maintainers have agreed to invite Sorabh Hamirwasia(@sohami) to be a co-maintainer. Sorabh has kindly accepted. Sorabh has led the design and implementation of multiple features like query cancellation, concurrent segment search for aggregations and snapshot inter-operability w/remote storage in OpenSearch. Some significant issues and PRs authored by Sorabh for this effort are as follows: Feature proposals Concurrent segment search for aggregations : #6798 Searchable Remote Index : #2900 Implementations Concurrent segment search for aggregations: #7514 Lucene changes to leaf slices for concurrent search: apache/lucene#12374 Moving concurrent search to core : #7203 Query cancellation support : #986 In total, Sorabh has authored 14 PRs going back to Aug 2021. He also frequently reviews contributions from others and has reviewed nearly 20 PRs in the same time frame. I believe Sorabh will be a valuable addition as a maintainer of OpenSearch and will continue to contribute to the success of the project going forward. (cherry picked from commit 7769682) Signed-off-by: Bukhtawar Khan <bukhtawa@amazon.com>
I have nominated and maintainers have agreed to invite Sorabh Hamirwasia(@sohami) to be a co-maintainer. Sorabh has kindly accepted. Sorabh has led the design and implementation of multiple features like query cancellation, concurrent segment search for aggregations and snapshot inter-operability w/remote storage in OpenSearch. Some significant issues and PRs authored by Sorabh for this effort are as follows: Feature proposals Concurrent segment search for aggregations : opensearch-project#6798 Searchable Remote Index : opensearch-project#2900 Implementations Concurrent segment search for aggregations: opensearch-project#7514 Lucene changes to leaf slices for concurrent search: apache/lucene#12374 Moving concurrent search to core : opensearch-project#7203 Query cancellation support : opensearch-project#986 In total, Sorabh has authored 14 PRs going back to Aug 2021. He also frequently reviews contributions from others and has reviewed nearly 20 PRs in the same time frame. I believe Sorabh will be a valuable addition as a maintainer of OpenSearch and will continue to contribute to the success of the project going forward. Signed-off-by: Bukhtawar Khan <bukhtawa@amazon.com>
I have nominated and maintainers have agreed to invite Sorabh Hamirwasia(@sohami) to be a co-maintainer. Sorabh has kindly accepted. Sorabh has led the design and implementation of multiple features like query cancellation, concurrent segment search for aggregations and snapshot inter-operability w/remote storage in OpenSearch. Some significant issues and PRs authored by Sorabh for this effort are as follows: Feature proposals Concurrent segment search for aggregations : opensearch-project#6798 Searchable Remote Index : opensearch-project#2900 Implementations Concurrent segment search for aggregations: opensearch-project#7514 Lucene changes to leaf slices for concurrent search: apache/lucene#12374 Moving concurrent search to core : opensearch-project#7203 Query cancellation support : opensearch-project#986 In total, Sorabh has authored 14 PRs going back to Aug 2021. He also frequently reviews contributions from others and has reviewed nearly 20 PRs in the same time frame. I believe Sorabh will be a valuable addition as a maintainer of OpenSearch and will continue to contribute to the success of the project going forward. Signed-off-by: Bukhtawar Khan <bukhtawa@amazon.com> Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>
I have nominated and maintainers have agreed to invite Sorabh Hamirwasia(@sohami) to be a co-maintainer. Sorabh has kindly accepted. Sorabh has led the design and implementation of multiple features like query cancellation, concurrent segment search for aggregations and snapshot inter-operability w/remote storage in OpenSearch. Some significant issues and PRs authored by Sorabh for this effort are as follows: Feature proposals Concurrent segment search for aggregations : opensearch-project#6798 Searchable Remote Index : opensearch-project#2900 Implementations Concurrent segment search for aggregations: opensearch-project#7514 Lucene changes to leaf slices for concurrent search: apache/lucene#12374 Moving concurrent search to core : opensearch-project#7203 Query cancellation support : opensearch-project#986 In total, Sorabh has authored 14 PRs going back to Aug 2021. He also frequently reviews contributions from others and has reviewed nearly 20 PRs in the same time frame. I believe Sorabh will be a valuable addition as a maintainer of OpenSearch and will continue to contribute to the success of the project going forward. Signed-off-by: Bukhtawar Khan <bukhtawa@amazon.com> Signed-off-by: Ivan Brusic <ivan.brusic@flocksafety.com>
I have nominated and maintainers have agreed to invite Sorabh Hamirwasia(@sohami) to be a co-maintainer. Sorabh has kindly accepted. Sorabh has led the design and implementation of multiple features like query cancellation, concurrent segment search for aggregations and snapshot inter-operability w/remote storage in OpenSearch. Some significant issues and PRs authored by Sorabh for this effort are as follows: Feature proposals Concurrent segment search for aggregations : opensearch-project#6798 Searchable Remote Index : opensearch-project#2900 Implementations Concurrent segment search for aggregations: opensearch-project#7514 Lucene changes to leaf slices for concurrent search: apache/lucene#12374 Moving concurrent search to core : opensearch-project#7203 Query cancellation support : opensearch-project#986 In total, Sorabh has authored 14 PRs going back to Aug 2021. He also frequently reviews contributions from others and has reviewed nearly 20 PRs in the same time frame. I believe Sorabh will be a valuable addition as a maintainer of OpenSearch and will continue to contribute to the success of the project going forward. Signed-off-by: Bukhtawar Khan <bukhtawa@amazon.com> Signed-off-by: Shivansh Arora <hishiv@amazon.com>
Description
Add a constructor which takes in the computed slices from extensions and uses that for running the search concurrently on provided executor. This is based on the discussion on the issue #12347