-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Concurrent Searching #1286
Comments
Same as #1145? |
@anasalkouz I am about to start prototyping the usage of |
@reta , Sure, go ahead. looking forward to see the prototype. |
Is your feature request related to a problem? Please describe. [1] https://lucene.apache.org/core/8_10_1/core/org/apache/lucene/search/IndexSearcher.html#search-org.apache.lucene.search.Query-org.apache.lucene.search.CollectorManager- Describe the solution you'd like
The change, although quite complex, is mostly isolated in the Describe alternatives you've considered Additional context The impediments: early termination and time-bounded search are exception driven. This is difficult to replicate as-is, in this case the flow is interrupted and the reducers are not available. It would make sense to come up with the benchmarks to compare the sequential and parallel segment search and have a proof when each of those would be useful. Also, once such proof is collected, the engine itself may provide the hints at runtime to recommend switching the feature on/off (probably, on per-index basis). |
Moving content of the last comment to the body of the issue |
The Concurrent Searching has been implemented as a sandbox plugin under the name
Once installed, the search phase is going to use concurrent search over the Lucene segments (please see #2585 and #2586). |
Hey @reta, I see there was mention of a |
Hey @jed326 , it was dropped in favor of shipping as a sandbox plugin, correct, thank you |
Is your feature request related to a problem? Please describe.
At least since Apache Lucene 6.x, there is a new experimental low-level API which allows to parallelize execution of the search across segments [3]. As of latest Apache Lucene 8.10.1, the API is still marked as experimental (see please [1]). The community feedback on this feature is looking positive so far (see please [2]), there are high chances that for certain kind of indices parallelizing the search over segments could bring performance benefits.
[1] https://lucene.apache.org/core/8_10_1/core/org/apache/lucene/search/IndexSearcher.html#search-org.apache.lucene.search.Query-org.apache.lucene.search.CollectorManager-
[2] https://engineeringblog.yelp.com/2021/09/nrtsearch-yelps-fast-scalable-and-cost-effective-search-engine.html
[3] https://blog.mikemccandless.com/2019/10/concurrent-query-execution-in-apache.html
Describe the solution you'd like
From the essential parts, since the API is experimental, it should be controlled by the setting and have allocated a dedicated configurable thread pool:
The change, although quite complex, is mostly isolated in the
QueryPhase
andQueryCollectorContext
(and surrounding classes).Describe alternatives you've considered
N/A
Additional context
Currently, the search implementation implies sequential flow, the results are accumulated by individual collectors (backed by collector contexts) and post processed at the end. It has to be changed to use
CollectorManager
s and reducers instead to assemble the final query results.The impediments: early termination and time-bounded search are exception driven. This is difficult to replicate as-is, in this case the flow is interrupted and the reducers are not available.
It would make sense to come up with the benchmarks to compare the sequential and parallel segment search and have a proof when each of those would be useful. Also, once such proof is collected, the engine itself may provide the hints at runtime to recommend switching the feature on/off (probably, on per-index basis).
The text was updated successfully, but these errors were encountered: