Skip to content

Conversation

@atris
Copy link
Contributor

@atris atris commented Aug 8, 2025

This commit reintroduces the QueryPhaseListener functionality that was reverted
in #18913 due to performance regressions, now with optimizations that eliminate
the overhead.

Changes:

  • Add QueryPhaseListener interface for plugins to hook into query phase
  • Add AbstractQueryPhaseSearcher implementing template pattern for listeners
  • Modify QueryPhaseSearcher interface to support listener registration
  • Add comprehensive tests for listener functionality

Performance optimizations:

  • Fast-path in DefaultQueryPhaseSearcher bypasses template pattern when no listeners
  • Direct hasQueryPhaseListeners() check avoids virtual method calls
  • Single listener optimization avoids iterator creation
  • Zero overhead for the common case (no listeners registered)

The original implementation caused 10-15% regression in neural-search because it
forced all searches through the template pattern even without listeners. This
optimized version maintains the extension functionality while eliminating overhead
for plugins that extend DefaultQueryPhaseSearcher without using listeners.

Fixes #17593
Resolves performance regression from #18913

@github-actions github-actions bot added enhancement Enhancement or improvement to existing feature or request Search Search query, autocomplete ...etc labels Aug 8, 2025
@atris
Copy link
Contributor Author

atris commented Aug 8, 2025

@owaiskazi19 @andrross Please review

@github-actions
Copy link
Contributor

github-actions bot commented Aug 8, 2025

❌ Gradle check result for ad942ca: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@atris atris force-pushed the absolutely_new_runtime_extensions branch from ad942ca to bd4c107 Compare August 8, 2025 18:57
  This commit reintroduces the QueryPhaseListener functionality that was reverted
  in opensearch-project#18913 due to performance regressions, now with optimizations that eliminate
  the overhead.

  Changes:
  - Add QueryPhaseListener interface for plugins to hook into query phase
  - Add AbstractQueryPhaseSearcher implementing template pattern for listeners
  - Modify QueryPhaseSearcher interface to support listener registration
  - Add comprehensive tests for listener functionality

  Performance optimizations:
  - Fast-path in DefaultQueryPhaseSearcher bypasses template pattern when no listeners
  - Direct hasQueryPhaseListeners() check avoids virtual method calls
  - Single listener optimization avoids iterator creation
  - Zero overhead for the common case (no listeners registered)

  The original implementation caused 10-15% regression in neural-search because it
  forced all searches through the template pattern even without listeners. This
  optimized version maintains the extension functionality while eliminating overhead
  for plugins that extend DefaultQueryPhaseSearcher without using listeners.

  Fixes opensearch-project#17593
  Resolves performance regression from opensearch-project#18913

Signed-off-by: Atri Sharma <atri.jiit@gmail.com>
@atris atris force-pushed the absolutely_new_runtime_extensions branch from bd4c107 to 526897c Compare August 8, 2025 19:10
@github-actions
Copy link
Contributor

github-actions bot commented Aug 8, 2025

❌ Gradle check result for 526897c: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@atris atris closed this Aug 9, 2025
@atris atris reopened this Aug 9, 2025
@atris
Copy link
Contributor Author

atris commented Aug 9, 2025

Flaky tests #16576

@github-actions
Copy link
Contributor

github-actions bot commented Aug 9, 2025

✅ Gradle check result for 526897c: SUCCESS

@codecov
Copy link

codecov bot commented Aug 9, 2025

Codecov Report

❌ Patch coverage is 30.30303% with 23 lines in your changes missing coverage. Please review.
✅ Project coverage is 72.83%. Comparing base (9f13e37) to head (526897c).
⚠️ Report is 259 commits behind head on main.

Files with missing lines Patch % Lines
...earch/search/query/AbstractQueryPhaseSearcher.java 25.00% 19 Missing and 2 partials ⚠️
...n/java/org/opensearch/search/query/QueryPhase.java 33.33% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main   #18994      +/-   ##
============================================
- Coverage     72.88%   72.83%   -0.06%     
+ Complexity    69327    69326       -1     
============================================
  Files          5643     5646       +3     
  Lines        318720   318819      +99     
  Branches      46113    46129      +16     
============================================
- Hits         232294   232198      -96     
- Misses        67595    67828     +233     
+ Partials      18831    18793      -38     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@vibrantvarun
Copy link
Member

The QueryCollectorContextSpec was introduced earlier to inject QueryCollectorContextSpecs from the plugins PR #18637 . Currently, the DefaultQueryPhaseSearcher is been extended by ConcurrentQueryPhaseSearcher. There is already a PR which is in draft to remove this HybridCollectorManager dependency creation via HybridAggregationProcessor. This will also remove HybridQueryPhaseSearcher need because we can simply reuse defaultQP and ConcurrentQP for our searches. We have seen 13% performance improvement by doing that in POC and in the final draft of the PR we expect more. opensearch-project/neural-search#1424.

If we introduce QueryPhaseListeners, then I think we might loose the integrity of the QueryPhase. Because currently opensearch supports two type of searches: Concurrent and nonconcurrent. But the concurrent search relies for topDocsCollector on DefaultQP. If we introduce before and after QP listeners then user can change results by writing a custom logic in after QPlistener. This can break logic in HybridQuery.

cc: @martin-gaievski @owaiskazi19

@atris
Copy link
Contributor Author

atris commented Aug 11, 2025

The QueryCollectorContextSpec was introduced earlier to inject QueryCollectorContextSpecs from the plugins PR #18637 . Currently, the DefaultQueryPhaseSearcher is been extended by ConcurrentQueryPhaseSearcher. There is already a PR which is in draft to remove this HybridCollectorManager dependency creation via HybridAggregationProcessor. This will also remove HybridQueryPhaseSearcher need because we can simply reuse defaultQP and ConcurrentQP for our searches. We have seen 13% performance improvement by doing that in POC and in the final draft of the PR we expect more. opensearch-project/neural-search#1424.

If we introduce QueryPhaseListeners, then I think we might loose the integrity of the QueryPhase. Because currently opensearch supports two type of searches: Concurrent and nonconcurrent. But the concurrent search relies for topDocsCollector on DefaultQP. If we introduce before and after QP listeners then user can change results by writing a custom logic in after QPlistener. This can break logic in HybridQuery.

cc: @martin-gaievski @owaiskazi19

Thank you for raising these concerns. Let me address them:

  1. The QueryPhaseListener interface passes SearchContext but doesn't provide methods to modify search results. The afterCollection method is called after results are already collected and processed. While a
    malicious plugin could theoretically use reflection or other means to modify state, this is true for any plugin extension point.

  2. The implementation includes fast-path optimization that completely bypasses the listener logic when no listeners are registered. This means the 13% performance improvement won't be affected unless a plugin explicitly registers listeners.

  3. The listener hooks are placed at the outermost level of query execution, not interfering with the internal collector chain or concurrent search logic. They act more as observation points and post facto operations rather than modification points.

  4. The primary use case is for plugins that need to track query execution metrics, implement custom caching strategies, or perform reranking or provide reranking signals - not to modify search results.

Would it help if we:

  • Add explicit documentation that listeners should not modify search state?
  • Make the interface name more explicit like PostQueryPhase to clarify intent?
  • Add safeguards to prevent result modification?

The goal is to provide extension points without compromising the integrity of search execution.

@vibrantvarun
Copy link
Member

I will evaluate it more. But one Quick QQ: Why do we need this listeners? I mean is there any use case of it currently in implementing any functionality?

* This method allows for fast-path optimization to skip listener logic entirely.
* @return true if there are listeners, false otherwise
*/
default boolean hasQueryPhaseListeners() {
Copy link
Member

@martin-gaievski martin-gaievski Aug 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why we need this method, we can conclude on presence of listeners based on the size of the list of QueryPhaseListener objects. You can change

if (!hasQueryPhaseListeners()) {

to

if (Objects.notNull(queryPhaseListeners()) == false && queryPhaseListeners().isEmpty() == false)

@martin-gaievski
Copy link
Member

The original implementation caused 10-15% regression in neural-search because it
forced all searches through the template pattern even without listeners. This
optimized version maintains the extension functionality while eliminating overhead
for plugins that extend DefaultQueryPhaseSearcher without using listeners.

Are you sure about this being a main factor, is there data the support this? The workload we run while checking for regressions - it has vast majority of hybrid queries.
It's good optimization overall, but for that scenario I suspect some other factors played key role.

@atris
Copy link
Contributor Author

atris commented Aug 13, 2025

The original implementation caused 10-15% regression in neural-search because it
forced all searches through the template pattern even without listeners. This
optimized version maintains the extension functionality while eliminating overhead
for plugins that extend DefaultQueryPhaseSearcher without using listeners.

Are you sure about this being a main factor, is there data the support this? The workload we run while checking for regressions - it has vast majority of hybrid queries. It's good optimization overall, but for that scenario I suspect some other factors played key role.

Sorry for the delay.

The template pattern overhead is real but limited in scope. QueryPhaseBeforeAfterTests.java:35-60 shows measurable overhead in tight loops and fast-path eliminates virtual calls when hasQueryPhaseListeners() returns false. Since Neural-search extends DefaultQueryPhaseSearcher without listeners, hence It would benefits from this

However, for hybrid queries specifically, I agree that this could not be the singular source of regression. The objective of this PR is to minimise the reproducible lag and then revisit it once the numbers for neural-search come in with this change.

But yes, this alone won't fix all hybrid query performance issues. Would need profiling on actual hybrid workloads to identify other bottlenecks.

@opensearch-trigger-bot
Copy link
Contributor

This PR is stalled because it has been open for 30 days with no activity.

@opensearch-trigger-bot opensearch-trigger-bot bot added stalled Issues that have stalled and removed stalled Issues that have stalled labels Sep 12, 2025
@opensearch-trigger-bot
Copy link
Contributor

This PR is stalled because it has been open for 30 days with no activity.

@opensearch-trigger-bot opensearch-trigger-bot bot added stalled Issues that have stalled and removed stalled Issues that have stalled labels Oct 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement Enhancement or improvement to existing feature or request Search Search query, autocomplete ...etc

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature Request] Add extension points for pre/post collecting scores in QueryPhase

3 participants