-
Notifications
You must be signed in to change notification settings - Fork 105
Introducing HybridQueryCollectorContextSpec to improve the hybrid search performance #1534
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: vibrantvarun <jainvarun4996@gmail.com>
Signed-off-by: vibrantvarun <jainvarun4996@gmail.com>
Signed-off-by: vibrantvarun <jainvarun4996@gmail.com>
|
Please start reviewing only when it is open for review. Thanks |
|
PR description? |
|
@yuye-aws PR is in draft state. Once it is ready for review I will add PR description. |
…QueryCollectorContextSpecImpl
Signed-off-by: vibrantvarun <jainvarun4996@gmail.com>
Signed-off-by: vibrantvarun <jainvarun4996@gmail.com>
Signed-off-by: vibrantvarun <jainvarun4996@gmail.com>
Signed-off-by: vibrantvarun <jainvarun4996@gmail.com>
…QueryCollectorContextSpecImpl
|
No Integ tests have been added in this PR because it does not contain any new feature. For all hybrid search features, the integ tests are already in place therefore all should pass in this enhancement. Moreover, the BWC tests are also in place. |
…QueryCollectorContextSpecImpl
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great improvement and coming out of the hack of AggregationProcessor. Did initial review
src/main/java/org/opensearch/neuralsearch/search/collector/HybridQueryCollectorContextSpec.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/neuralsearch/search/collector/HybridQueryCollectorContextSpec.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/neuralsearch/search/query/HybridCollectorManager.java
Outdated
Show resolved
Hide resolved
| filteringWeight, | ||
| searchContext | ||
| ); | ||
| return new HybridCollectorManager( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How are we handling concurrent searches?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, please put this info into PR description or code level comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will add it in code level comments.
| this.searchContext = builder.searchContext; | ||
| this.topDocsMerger = new TopDocsMerger(searchContext.sort()); | ||
| this.trackTotalHitsUpTo = searchContext.trackTotalHitsUpTo(); | ||
| this.isSortEnabled = searchContext.sort() != null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we have getters here to check if sort is enabled, collapse etc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is @Getter added at the top of the class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missed it
...main/java/org/opensearch/neuralsearch/search/query/util/HybridSearchCollectorResultUtil.java
Outdated
Show resolved
Hide resolved
| searchContext.from(0); | ||
| } | ||
|
|
||
| Weight filteringWeight = null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we remove filteredWeights and moved back to default post filters, did we verify with this change can we switch back other features of hybrid search to default as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. We also made aggregations switch to default. Rest all sorting, collapse, pagination have custom logic specific to Hybrid query.
-
HybridAggregationProcessor is simply delegating the call to core aggs processor. We can even remove it if we find a way to execute a peice of code which is currently running in its post process. The updateQueryResult method.
-
HybridQueryPhaseSearcher also can be removed if we find a way to inject a query directly from the plugin. For example when hyrbid query comes with index alias filters then it is wrapped in boolean query. At the plugin end we extract the core hybrid part from the boolean query and inject it in the QueryPhaseSearcher. Already discussed this with @reta as currently there is no way of doing it. Therefore, we have to keep hybridQueryPhaseSearchert.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add more unit tests, at a glance we're missing tests for HybridQueryCollectorContextSpec and HybridQueryCollectorContextSpecFactory.
Also please make sure you executed full set of tests for aggregations using test_aggs param, those are skipped by default when you do integTest gradle command. Example commend ./gradlew ':integTest' -Dtest_aggs=true --tests "org.opensearch.neuralsearch.query.aggregation.*IT"
src/main/java/org/opensearch/neuralsearch/search/query/HybridAggregationProcessor.java
Outdated
Show resolved
Hide resolved
| filteringWeight, | ||
| searchContext | ||
| ); | ||
| return new HybridCollectorManager( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, please put this info into PR description or code level comments
|
…ctorContextSpecFactory Signed-off-by: vibrantvarun <jainvarun4996@gmail.com>
Signed-off-by: vibrantvarun <jainvarun4996@gmail.com>
Signed-off-by: vibrantvarun <jainvarun4996@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Coming along
src/main/java/org/opensearch/neuralsearch/search/query/HybridCollectorManager.java
Outdated
Show resolved
Hide resolved
| this.searchContext = builder.searchContext; | ||
| this.topDocsMerger = new TopDocsMerger(searchContext.sort()); | ||
| this.trackTotalHitsUpTo = searchContext.trackTotalHitsUpTo(); | ||
| this.isSortEnabled = searchContext.sort() != null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missed it
| } | ||
|
|
||
| @SneakyThrows | ||
| public void testCollectorManager_whenHybridQueryAndNotConcurrentSearch_thenSuccessful() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that we have rely on HybridQueryCollectorContextSpec to handle concurrent searches, how can we test concurrent searches in hybrid search? Through ITs probably
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually we rely on ConcurrentQueryPhaseSearcher
Signed-off-by: vibrantvarun <jainvarun4996@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code looks good, please conduct manual testing for possible regression (like missing search hits) and how exactly this affects response times and throughput
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. Can we have benchmark numbers before merging this in?
@owaiskazi19 We are merging this in feature branch. Before merging in main branch I will benchmark the final product. |
|
I will wait for one more maintainer approval. @bzhangam , @heemin32 @junqiu-lei Can one of you take a look as you have most context about this. Thanks |
|
Looks good to me. Thanks for the improvement and also reorganize the codes. |
f2d02b3
into
opensearch-project:feature/querycollectocontextspec_in_hybrid_query
…rch performance (opensearch-project#1534) * QueryCollectorContextSpec implemenation for hybrid query Signed-off-by: vibrantvarun <jainvarun4996@gmail.com> * HybridQueryCollectContextSpec implemenation Signed-off-by: vibrantvarun <jainvarun4996@gmail.com> * Optimizations in Hybrid Query Signed-off-by: vibrantvarun <jainvarun4996@gmail.com> * Reformat files and add javadoc Signed-off-by: vibrantvarun <jainvarun4996@gmail.com> * Add changelog Signed-off-by: vibrantvarun <jainvarun4996@gmail.com> * Update Changelog Signed-off-by: vibrantvarun <jainvarun4996@gmail.com> * Add HybridSearchCollectorResultUtilTests Signed-off-by: vibrantvarun <jainvarun4996@gmail.com> * Add unit test of HybridQueryCollectorContextSpec and HybridQueryCollectorContextSpecFactory Signed-off-by: vibrantvarun <jainvarun4996@gmail.com> * Add isHybridQuery check in postProcess of HybridAggregationProcessor Signed-off-by: vibrantvarun <jainvarun4996@gmail.com> * Fix unit tests Signed-off-by: vibrantvarun <jainvarun4996@gmail.com> * Modify comments to use release branch code Signed-off-by: vibrantvarun <jainvarun4996@gmail.com> --------- Signed-off-by: vibrantvarun <jainvarun4996@gmail.com>
…rch performance (opensearch-project#1534) * QueryCollectorContextSpec implemenation for hybrid query Signed-off-by: vibrantvarun <jainvarun4996@gmail.com> * HybridQueryCollectContextSpec implemenation Signed-off-by: vibrantvarun <jainvarun4996@gmail.com> * Optimizations in Hybrid Query Signed-off-by: vibrantvarun <jainvarun4996@gmail.com> * Reformat files and add javadoc Signed-off-by: vibrantvarun <jainvarun4996@gmail.com> * Add changelog Signed-off-by: vibrantvarun <jainvarun4996@gmail.com> * Update Changelog Signed-off-by: vibrantvarun <jainvarun4996@gmail.com> * Add HybridSearchCollectorResultUtilTests Signed-off-by: vibrantvarun <jainvarun4996@gmail.com> * Add unit test of HybridQueryCollectorContextSpec and HybridQueryCollectorContextSpecFactory Signed-off-by: vibrantvarun <jainvarun4996@gmail.com> * Add isHybridQuery check in postProcess of HybridAggregationProcessor Signed-off-by: vibrantvarun <jainvarun4996@gmail.com> * Fix unit tests Signed-off-by: vibrantvarun <jainvarun4996@gmail.com> * Modify comments to use release branch code Signed-off-by: vibrantvarun <jainvarun4996@gmail.com> --------- Signed-off-by: vibrantvarun <jainvarun4996@gmail.com>
…rch performance (#1576) * Introducing HybridQueryCollectorContextSpec to improve the hybrid search performance (#1534) * QueryCollectorContextSpec implemenation for hybrid query Signed-off-by: vibrantvarun <jainvarun4996@gmail.com> * HybridQueryCollectContextSpec implemenation Signed-off-by: vibrantvarun <jainvarun4996@gmail.com> * Optimizations in Hybrid Query Signed-off-by: vibrantvarun <jainvarun4996@gmail.com> * Reformat files and add javadoc Signed-off-by: vibrantvarun <jainvarun4996@gmail.com> * Add changelog Signed-off-by: vibrantvarun <jainvarun4996@gmail.com> * Update Changelog Signed-off-by: vibrantvarun <jainvarun4996@gmail.com> * Add HybridSearchCollectorResultUtilTests Signed-off-by: vibrantvarun <jainvarun4996@gmail.com> * Add unit test of HybridQueryCollectorContextSpec and HybridQueryCollectorContextSpecFactory Signed-off-by: vibrantvarun <jainvarun4996@gmail.com> * Add isHybridQuery check in postProcess of HybridAggregationProcessor Signed-off-by: vibrantvarun <jainvarun4996@gmail.com> * Fix unit tests Signed-off-by: vibrantvarun <jainvarun4996@gmail.com> * Modify comments to use release branch code Signed-off-by: vibrantvarun <jainvarun4996@gmail.com> --------- Signed-off-by: vibrantvarun <jainvarun4996@gmail.com> * Optimize HybridAggregationProcessor postProcess Signed-off-by: vibrantvarun <jainvarun4996@gmail.com> --------- Signed-off-by: vibrantvarun <jainvarun4996@gmail.com>
Description
This PR introduces following changes to improve search performance
Implement QueryCollectorContextSpec in Hybrid Query . It will help to directly inject hybrid search collectors in the search.
a. Earlier we relied on HybridQueryAggregationProcessor.preprocess for creating collectorManager.
We also removed empty collector context like DefaultQueryPhaseSearcherWithEmptyContext and ConcurrentQueryPhaseWithEmptyContext from the HybridQueryPhaseSearcher to avoid execution of topDocsCollector. [This was creating additional overhead]
We reduced dependency from HybridQueryAggregationProcessor by removing collectorManager logic from there. It was hacky approach. Now, we only need it to set the size in updateQueryResult during single shard scenario because in that case the fetch phase would run before normalization phase.
We removed creation of filteringWeight from the HybridCollectorManager when post filters are applied. We switched back to the default implementation coming from QueryPhase. Therefore, we no longer need FilteredCollector creation at neural-search end . OpenSearch will take care of it and wrap it in MultiCollector.
We created a dedicated class HybridSearchCollectorResultsUtil for hybrid collector results processing.
It helps to remove multiple collapse, sorting checks.
Class is dedicated for a collector result. So any change specific to result can be done there and it will reflect everywhere.
Related Issues
Resolves
18278
Check List
--signoff.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.