Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Concurrent Searching #1286

Closed
anasalkouz opened this issue Sep 23, 2021 · 9 comments · Fixed by #1500
Closed

Concurrent Searching #1286

anasalkouz opened this issue Sep 23, 2021 · 9 comments · Fixed by #1500

Comments

@anasalkouz
Copy link
Member

anasalkouz commented Sep 23, 2021

Is your feature request related to a problem? Please describe.
At least since Apache Lucene 6.x, there is a new experimental low-level API which allows to parallelize execution of the search across segments [3]. As of latest Apache Lucene 8.10.1, the API is still marked as experimental (see please [1]). The community feedback on this feature is looking positive so far (see please [2]), there are high chances that for certain kind of indices parallelizing the search over segments could bring performance benefits.

[1] https://lucene.apache.org/core/8_10_1/core/org/apache/lucene/search/IndexSearcher.html#search-org.apache.lucene.search.Query-org.apache.lucene.search.CollectorManager-
[2] https://engineeringblog.yelp.com/2021/09/nrtsearch-yelps-fast-scalable-and-cost-effective-search-engine.html
[3] https://blog.mikemccandless.com/2019/10/concurrent-query-execution-in-apache.html

Describe the solution you'd like
From the essential parts, since the API is experimental, it should be controlled by the setting and have allocated a dedicated configurable thread pool:

  • "search.allow_concurrent_segment_search", default value is false
  • "index_searcher" thread pool (default number of threads == number of cores)

The change, although quite complex, is mostly isolated in the QueryPhase and QueryCollectorContext (and surrounding classes).

Describe alternatives you've considered
N/A

Additional context
Currently, the search implementation implies sequential flow, the results are accumulated by individual collectors (backed by collector contexts) and post processed at the end. It has to be changed to use CollectorManagers and reducers instead to assemble the final query results.

The impediments: early termination and time-bounded search are exception driven. This is difficult to replicate as-is, in this case the flow is interrupted and the reducers are not available.

It would make sense to come up with the benchmarks to compare the sequential and parallel segment search and have a proof when each of those would be useful. Also, once such proof is collected, the engine itself may provide the hints at runtime to recommend switching the feature on/off (probably, on per-index basis).

@mitalawachat
Copy link

Same as #1145?

@anasalkouz
Copy link
Member Author

Same as #1145?

Issue #1145 has no details, so I am not sure if there are duplicate. But If this is the case, feel free to close #1145, since this issue has more details.

@reta
Copy link
Collaborator

reta commented Oct 25, 2021

@anasalkouz I am about to start prototyping the usage of IndexSearcher using parallel execution, do you mind if I take care of this issue? Thank you.

@anasalkouz
Copy link
Member Author

@reta , Sure, go ahead. looking forward to see the prototype.

@reta
Copy link
Collaborator

reta commented Nov 3, 2021

Is your feature request related to a problem? Please describe.
At least since Apache Lucene 6.x, there is a new experimental low-level API which allows to parallelize execution of the search across segments [3]. As of latest Apache Lucene 8.10.1, the API is still marked as experimental (see please [1]). The community feedback on this feature is looking positive so far (see please [2]), there are high chances that for certain kind of indices parallelizing the search over segments could bring performance benefits.

[1] https://lucene.apache.org/core/8_10_1/core/org/apache/lucene/search/IndexSearcher.html#search-org.apache.lucene.search.Query-org.apache.lucene.search.CollectorManager-
[2] https://engineeringblog.yelp.com/2021/09/nrtsearch-yelps-fast-scalable-and-cost-effective-search-engine.html
[3] https://blog.mikemccandless.com/2019/10/concurrent-query-execution-in-apache.html

Describe the solution you'd like
From the essential parts, since the API is experimental, it should be controlled by the setting and have allocated a dedicated configurable thread pool:

  • "search.allow_concurrent_segment_search", default value is false
  • "index_searcher" thread pool (default number of threads == number of cores)

The change, although quite complex, is mostly isolated in the QueryPhase and QueryCollectorContext (and surrounding classes).

Describe alternatives you've considered
N/A

Additional context
Currently, the search implementation implies sequential flow, the results are accumulated by individual collectors (backed by collector contexts) and post processed at the end. It has to be changed to use CollectorManagers and reducers instead to assemble the final query results.

The impediments: early termination and time-bounded search are exception driven. This is difficult to replicate as-is, in this case the flow is interrupted and the reducers are not available.

It would make sense to come up with the benchmarks to compare the sequential and parallel segment search and have a proof when each of those would be useful. Also, once such proof is collected, the engine itself may provide the hints at runtime to recommend switching the feature on/off (probably, on per-index basis).

@anasalkouz
Copy link
Member Author

Moving content of the last comment to the body of the issue

@reta
Copy link
Collaborator

reta commented Mar 24, 2022

The Concurrent Searching has been implemented as a sandbox plugin under the name concurrent-search which could be installed using regular procedure:

./bin/opensearch-plugin install <path>/sandbox/plugins/concurrent-search/build/distributions/concurrent-search-2.0.0-SNAPSHOT.zip      

Once installed, the search phase is going to use concurrent search over the Lucene segments (please see #2585 and #2586).

@reta reta mentioned this issue Mar 24, 2022
9 tasks
@jed326
Copy link
Collaborator

jed326 commented Mar 10, 2023

Hey @reta, I see there was mention of a search.allow_concurrent_segment_search setting to dynamically enable/disable concurrent segment search. Was this implemented, or was it dropped in favor of shipping as a sandbox plugin?

@reta
Copy link
Collaborator

reta commented Mar 11, 2023

Hey @reta, I see there was mention of a search.allow_concurrent_segment_search setting to dynamically enable/disable concurrent segment search. Was this implemented, or was it dropped in favor of shipping as a sandbox plugin?

Hey @jed326 , it was dropped in favor of shipping as a sandbox plugin, correct, thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants