Skip to content

Conversation

@vibrantvarun
Copy link
Member

@vibrantvarun vibrantvarun commented Aug 29, 2025

Description

This PR introduces following changes to improve search performance

  1. Implement QueryCollectorContextSpec in Hybrid Query . It will help to directly inject hybrid search collectors in the search.
    a. Earlier we relied on HybridQueryAggregationProcessor.preprocess for creating collectorManager.

  2. We also removed empty collector context like DefaultQueryPhaseSearcherWithEmptyContext and ConcurrentQueryPhaseWithEmptyContext from the HybridQueryPhaseSearcher to avoid execution of topDocsCollector. [This was creating additional overhead]

  3. We reduced dependency from HybridQueryAggregationProcessor by removing collectorManager logic from there. It was hacky approach. Now, we only need it to set the size in updateQueryResult during single shard scenario because in that case the fetch phase would run before normalization phase.

  4. We removed creation of filteringWeight from the HybridCollectorManager when post filters are applied. We switched back to the default implementation coming from QueryPhase. Therefore, we no longer need FilteredCollector creation at neural-search end . OpenSearch will take care of it and wrap it in MultiCollector.

  5. We created a dedicated class HybridSearchCollectorResultsUtil for hybrid collector results processing.
    It helps to remove multiple collapse, sorting checks.
    Class is dedicated for a collector result. So any change specific to result can be done there and it will reflect everywhere.

Related Issues

Resolves
18278

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: vibrantvarun <jainvarun4996@gmail.com>
Signed-off-by: vibrantvarun <jainvarun4996@gmail.com>
Signed-off-by: vibrantvarun <jainvarun4996@gmail.com>
@vibrantvarun vibrantvarun changed the base branch from main to feature/querycollectocontextspec_in_hybrid_query August 29, 2025 06:54
@vibrantvarun vibrantvarun changed the title Query collector context spec impl Introducing HybridQueryCollectorContext to improve the hybrid search performance Aug 29, 2025
@vibrantvarun
Copy link
Member Author

Please start reviewing only when it is open for review. Thanks

@yuye-aws
Copy link
Member

PR description?

@vibrantvarun
Copy link
Member Author

@yuye-aws PR is in draft state. Once it is ready for review I will add PR description.

Signed-off-by: vibrantvarun <jainvarun4996@gmail.com>
Signed-off-by: vibrantvarun <jainvarun4996@gmail.com>
Signed-off-by: vibrantvarun <jainvarun4996@gmail.com>
Signed-off-by: vibrantvarun <jainvarun4996@gmail.com>
@vibrantvarun
Copy link
Member Author

vibrantvarun commented Sep 2, 2025

No Integ tests have been added in this PR because it does not contain any new feature. For all hybrid search features, the integ tests are already in place therefore all should pass in this enhancement. Moreover, the BWC tests are also in place.

Copy link
Member

@ryanbogan ryanbogan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@owaiskazi19 owaiskazi19 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great improvement and coming out of the hack of AggregationProcessor. Did initial review

filteringWeight,
searchContext
);
return new HybridCollectorManager(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How are we handling concurrent searches?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, please put this info into PR description or code level comments

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will add it in code level comments.

this.searchContext = builder.searchContext;
this.topDocsMerger = new TopDocsMerger(searchContext.sort());
this.trackTotalHitsUpTo = searchContext.trackTotalHitsUpTo();
this.isSortEnabled = searchContext.sort() != null;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have getters here to check if sort is enabled, collapse etc?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is @Getter added at the top of the class.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missed it

searchContext.from(0);
}

Weight filteringWeight = null;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we remove filteredWeights and moved back to default post filters, did we verify with this change can we switch back other features of hybrid search to default as well?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. We also made aggregations switch to default. Rest all sorting, collapse, pagination have custom logic specific to Hybrid query.

  1. HybridAggregationProcessor is simply delegating the call to core aggs processor. We can even remove it if we find a way to execute a peice of code which is currently running in its post process. The updateQueryResult method.

  2. HybridQueryPhaseSearcher also can be removed if we find a way to inject a query directly from the plugin. For example when hyrbid query comes with index alias filters then it is wrapped in boolean query. At the plugin end we extract the core hybrid part from the boolean query and inject it in the QueryPhaseSearcher. Already discussed this with @reta as currently there is no way of doing it. Therefore, we have to keep hybridQueryPhaseSearchert.

Copy link
Member

@martin-gaievski martin-gaievski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add more unit tests, at a glance we're missing tests for HybridQueryCollectorContextSpec and HybridQueryCollectorContextSpecFactory.
Also please make sure you executed full set of tests for aggregations using test_aggs param, those are skipped by default when you do integTest gradle command. Example commend ./gradlew ':integTest' -Dtest_aggs=true --tests "org.opensearch.neuralsearch.query.aggregation.*IT"

filteringWeight,
searchContext
);
return new HybridCollectorManager(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, please put this info into PR description or code level comments

@vibrantvarun
Copy link
Member Author

@martin-gaievski

Also please make sure you executed full set of tests for aggregations using test_aggs param, those are skipped by default when you do integTest gradle command. Example commend ./gradlew ':integTest' -Dtest_aggs=true --tests "org.opensearch.neuralsearch.query.aggregation.*IT"

 ./gradlew ':integTest' -Dtest_aggs=true --tests "org.opensearch.neuralsearch.query.aggregation.*IT"
WARNING: A restricted method in java.lang.System has been called
WARNING: java.lang.System::load has been called by net.rubygrapefruit.platform.internal.NativeLibraryLoader in an unnamed module (file:/Users/varunudr/.gradle/wrapper/dists/gradle-8.14.3-all/10utluxaxniiv4wxiphsi49nj/gradle-8.14.3/lib/native-platform-0.22-milestone-28.jar)
WARNING: Use --enable-native-access=ALL-UNNAMED to avoid a warning for callers in this module
WARNING: Restricted methods will be blocked in a future release unless native access is enabled

Starting a Gradle Daemon (subsequent builds will be faster)
=======================================
OpenSearch Build Hamster says Hello!
  Gradle Version        : 8.14.3
  OS Info               : Mac OS X 15.6.1 (aarch64)
  JDK Version           : 24 (Amazon Corretto JDK)
  JAVA_HOME             : /Library/Java/JavaVirtualMachines/amazon-corretto-24.jdk/Contents/Home
  Random Testing Seed   : 2F3C5D8DBB9754F6
  Crypto Standard       : any-supported
=======================================

> Task :compileJava
Note: /Users/varunudr/search/neural-search/src/main/java/org/opensearch/neuralsearch/search/query/util/HybridSearchCollectorResultUtil.java uses unchecked or unsafe operations.
Note: Recompile with -Xlint:unchecked for details.

> Task :compileTestJava
Note: Some input files use unchecked or unsafe operations.
Note: Recompile with -Xlint:unchecked for details.

> Task :integTest
WARNING: A terminally deprecated method in sun.misc.Unsafe has been called
WARNING: sun.misc.Unsafe::objectFieldOffset has been called by net.bytebuddy.dynamic.loading.ClassInjector$UsingUnsafe$Dispatcher$CreationAction
WARNING: Please consider reporting this to the maintainers of class net.bytebuddy.dynamic.loading.ClassInjector$UsingUnsafe$Dispatcher$CreationAction
WARNING: sun.misc.Unsafe::objectFieldOffset will be removed in a future release

WARNING: A restricted method in java.lang.System has been called
WARNING: java.lang.System::load has been called by com.sun.jna.Native in an unnamed module (file:/Users/varunudr/.gradle/caches/modules-2/files-2.1/net.java.dev.jna/jna/5.16.0/ebea09f91dc9f7048099f963fb8d6f919f0a4d9c/jna-5.16.0.jar)
WARNING: Use --enable-native-access=ALL-UNNAMED to avoid a warning for callers in this module
WARNING: Restricted methods will be blocked in a future release unless native access is enabled


[Incubating] Problems report is available at: file:///Users/varunudr/search/neural-search/build/reports/problems/problems-report.html

Deprecated Gradle features were used in this build, making it incompatible with Gradle 9.0.

You can use '--warning-mode all' to show the individual deprecation warnings and determine if they come from your own scripts or plugins.

For more on this, please refer to https://docs.gradle.org/8.14.3/userguide/command_line_interface.html#sec:command_line_warnings in the Gradle documentation.

BUILD SUCCESSFUL in 1m 1s
23 actionable tasks: 12 executed, 11 up-to-date

…ctorContextSpecFactory

Signed-off-by: vibrantvarun <jainvarun4996@gmail.com>
Signed-off-by: vibrantvarun <jainvarun4996@gmail.com>
Signed-off-by: vibrantvarun <jainvarun4996@gmail.com>
Copy link
Member

@owaiskazi19 owaiskazi19 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Coming along

this.searchContext = builder.searchContext;
this.topDocsMerger = new TopDocsMerger(searchContext.sort());
this.trackTotalHitsUpTo = searchContext.trackTotalHitsUpTo();
this.isSortEnabled = searchContext.sort() != null;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missed it

}

@SneakyThrows
public void testCollectorManager_whenHybridQueryAndNotConcurrentSearch_thenSuccessful() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that we have rely on HybridQueryCollectorContextSpec to handle concurrent searches, how can we test concurrent searches in hybrid search? Through ITs probably

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually we rely on ConcurrentQueryPhaseSearcher

Signed-off-by: vibrantvarun <jainvarun4996@gmail.com>
Signed-off-by: vibrantvarun <jainvarun4996@gmail.com>
@vibrantvarun vibrantvarun changed the title Introducing HybridQueryCollectorContext to improve the hybrid search performance Introducing HybridQueryCollectorContextSpec to improve the hybrid search performance Sep 4, 2025
Copy link
Member

@martin-gaievski martin-gaievski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks good, please conduct manual testing for possible regression (like missing search hits) and how exactly this affects response times and throughput

Copy link
Member

@owaiskazi19 owaiskazi19 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Can we have benchmark numbers before merging this in?

@vibrantvarun
Copy link
Member Author

Looks good to me. Can we have benchmark numbers before merging this in?

@owaiskazi19 We are merging this in feature branch. Before merging in main branch I will benchmark the final product.

@vibrantvarun
Copy link
Member Author

I will wait for one more maintainer approval. @bzhangam , @heemin32 @junqiu-lei Can one of you take a look as you have most context about this.

Thanks

@bzhangam
Copy link
Collaborator

bzhangam commented Sep 4, 2025

Looks good to me. Thanks for the improvement and also reorganize the codes.

@vibrantvarun vibrantvarun merged commit f2d02b3 into opensearch-project:feature/querycollectocontextspec_in_hybrid_query Sep 4, 2025
59 checks passed
vibrantvarun added a commit to vibrantvarun/neural-search that referenced this pull request Sep 8, 2025
…rch performance (opensearch-project#1534)

* QueryCollectorContextSpec implemenation for hybrid query

Signed-off-by: vibrantvarun <jainvarun4996@gmail.com>

* HybridQueryCollectContextSpec implemenation

Signed-off-by: vibrantvarun <jainvarun4996@gmail.com>

* Optimizations in Hybrid Query

Signed-off-by: vibrantvarun <jainvarun4996@gmail.com>

* Reformat files and add javadoc

Signed-off-by: vibrantvarun <jainvarun4996@gmail.com>

* Add changelog

Signed-off-by: vibrantvarun <jainvarun4996@gmail.com>

* Update Changelog

Signed-off-by: vibrantvarun <jainvarun4996@gmail.com>

* Add HybridSearchCollectorResultUtilTests

Signed-off-by: vibrantvarun <jainvarun4996@gmail.com>

* Add unit test of HybridQueryCollectorContextSpec and HybridQueryCollectorContextSpecFactory

Signed-off-by: vibrantvarun <jainvarun4996@gmail.com>

* Add isHybridQuery check in postProcess of HybridAggregationProcessor

Signed-off-by: vibrantvarun <jainvarun4996@gmail.com>

* Fix unit tests

Signed-off-by: vibrantvarun <jainvarun4996@gmail.com>

* Modify comments to use release branch code

Signed-off-by: vibrantvarun <jainvarun4996@gmail.com>

---------

Signed-off-by: vibrantvarun <jainvarun4996@gmail.com>
vibrantvarun added a commit to vibrantvarun/neural-search that referenced this pull request Sep 22, 2025
…rch performance (opensearch-project#1534)

* QueryCollectorContextSpec implemenation for hybrid query

Signed-off-by: vibrantvarun <jainvarun4996@gmail.com>

* HybridQueryCollectContextSpec implemenation

Signed-off-by: vibrantvarun <jainvarun4996@gmail.com>

* Optimizations in Hybrid Query

Signed-off-by: vibrantvarun <jainvarun4996@gmail.com>

* Reformat files and add javadoc

Signed-off-by: vibrantvarun <jainvarun4996@gmail.com>

* Add changelog

Signed-off-by: vibrantvarun <jainvarun4996@gmail.com>

* Update Changelog

Signed-off-by: vibrantvarun <jainvarun4996@gmail.com>

* Add HybridSearchCollectorResultUtilTests

Signed-off-by: vibrantvarun <jainvarun4996@gmail.com>

* Add unit test of HybridQueryCollectorContextSpec and HybridQueryCollectorContextSpecFactory

Signed-off-by: vibrantvarun <jainvarun4996@gmail.com>

* Add isHybridQuery check in postProcess of HybridAggregationProcessor

Signed-off-by: vibrantvarun <jainvarun4996@gmail.com>

* Fix unit tests

Signed-off-by: vibrantvarun <jainvarun4996@gmail.com>

* Modify comments to use release branch code

Signed-off-by: vibrantvarun <jainvarun4996@gmail.com>

---------

Signed-off-by: vibrantvarun <jainvarun4996@gmail.com>
heemin32 pushed a commit that referenced this pull request Sep 23, 2025
…rch performance (#1576)

* Introducing HybridQueryCollectorContextSpec to improve the hybrid search performance (#1534)

* QueryCollectorContextSpec implemenation for hybrid query

Signed-off-by: vibrantvarun <jainvarun4996@gmail.com>

* HybridQueryCollectContextSpec implemenation

Signed-off-by: vibrantvarun <jainvarun4996@gmail.com>

* Optimizations in Hybrid Query

Signed-off-by: vibrantvarun <jainvarun4996@gmail.com>

* Reformat files and add javadoc

Signed-off-by: vibrantvarun <jainvarun4996@gmail.com>

* Add changelog

Signed-off-by: vibrantvarun <jainvarun4996@gmail.com>

* Update Changelog

Signed-off-by: vibrantvarun <jainvarun4996@gmail.com>

* Add HybridSearchCollectorResultUtilTests

Signed-off-by: vibrantvarun <jainvarun4996@gmail.com>

* Add unit test of HybridQueryCollectorContextSpec and HybridQueryCollectorContextSpecFactory

Signed-off-by: vibrantvarun <jainvarun4996@gmail.com>

* Add isHybridQuery check in postProcess of HybridAggregationProcessor

Signed-off-by: vibrantvarun <jainvarun4996@gmail.com>

* Fix unit tests

Signed-off-by: vibrantvarun <jainvarun4996@gmail.com>

* Modify comments to use release branch code

Signed-off-by: vibrantvarun <jainvarun4996@gmail.com>

---------

Signed-off-by: vibrantvarun <jainvarun4996@gmail.com>

* Optimize HybridAggregationProcessor postProcess

Signed-off-by: vibrantvarun <jainvarun4996@gmail.com>

---------

Signed-off-by: vibrantvarun <jainvarun4996@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants