Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding aggregations in hybrid query #630

Conversation

martin-gaievski
Copy link
Member

Description

Adding aggregations to hybrid query.
Implementation is based on design RFC . Big chunk of implementation is done under #624, in this PR we mainly:

  • changing way of getting total hits, it's number of unique documents from all sub-queries. in today's implementation it's max of results from individual sub-queries
  • adding base integ test for 1) metric, bucket and pipeline aggressions and 2) with and without concurrent search (as there was an overlap in initial implementation of hybrid query)

Issues Resolved

#509

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed as per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@martin-gaievski martin-gaievski added Features Introduces a new unit of functionality that satisfies a requirement backport 2.x Label will add auto workflow to backport PR to 2.x branch v2.13.0 labels Mar 12, 2024
@martin-gaievski martin-gaievski force-pushed the aggregations_in_hybrid_query branch 2 times, most recently from 57491e1 to 235761d Compare March 12, 2024 00:21
@@ -8,19 +8,17 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
### Enhancements
### Bug Fixes
- Fix async actions are left in neural_sparse query ([#438](https://github.com/opensearch-project/neural-search/pull/438))
- Fixed exception for case when Hybrid query being wrapped into bool query ([#490](https://github.com/opensearch-project/neural-search/pull/490))
- Hybrid query and nested type fields ([#498](https://github.com/opensearch-project/neural-search/pull/498))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

those PRs were released in 2.12, doing cleanup

- Fix typo for sparse encoding processor factory([#578](https://github.com/opensearch-project/neural-search/pull/578))
- Add non-null check for queryBuilder in NeuralQueryEnricherProcessor ([#615](https://github.com/opensearch-project/neural-search/pull/615))
### Infrastructure
### Documentation
### Maintenance
- Added support for jdk-21 ([#500](https://github.com/opensearch-project/neural-search/pull/500)))
Copy link
Member Author

@martin-gaievski martin-gaievski Mar 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as 2 lines above, this is part of 2.12

Copy link

codecov bot commented Mar 12, 2024

Codecov Report

Attention: Patch coverage is 80.00000% with 4 lines in your changes are missing coverage. Please review.

Project coverage is 82.57%. Comparing base (c9cdcc1) to head (b785198).

Files Patch % Lines
.../opensearch/neuralsearch/util/HybridQueryUtil.java 71.42% 0 Missing and 4 partials ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main     #630      +/-   ##
============================================
- Coverage     82.70%   82.57%   -0.13%     
- Complexity      650      656       +6     
============================================
  Files            51       52       +1     
  Lines          2053     2055       +2     
  Branches        329      328       -1     
============================================
- Hits           1698     1697       -1     
  Misses          212      212              
- Partials        143      146       +3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@martin-gaievski martin-gaievski force-pushed the aggregations_in_hybrid_query branch 3 times, most recently from fe4dfae to 09852a1 Compare March 12, 2024 00:53
@@ -63,7 +63,7 @@ public void testCombination_whenMultipleSubqueriesResultsAndDefaultMethod_thenSc
assertNotNull(queryTopDocs);
assertEquals(3, queryTopDocs.size());

assertEquals(3, queryTopDocs.get(0).getScoreDocs().size());
assertEquals(5, queryTopDocs.get(0).getScoreDocs().size());
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is needed because of the change in a way we count total hits

Signed-off-by: Martin Gaievski <gaievski@amazon.com>
Copy link
Collaborator

@navneet1v navneet1v left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments. Overall code looks good to me.

Signed-off-by: Martin Gaievski <gaievski@amazon.com>
@martin-gaievski martin-gaievski merged commit f04c058 into opensearch-project:main Mar 12, 2024
58 of 60 checks passed
@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.x failed:

The process '/usr/bin/git' failed with exit code 1

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-2.x 2.x
# Navigate to the new working tree
cd .worktrees/backport-2.x
# Create a new branch
git switch --create backport/backport-630-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 f04c058fc5ab193342c583cf820cd6cb72be42ea
# Push it to GitHub
git push --set-upstream origin backport/backport-630-to-2.x
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-2.x

Then, create a pull request where the base branch is 2.x and the compare/head branch is backport/backport-630-to-2.x.

@vibrantvarun
Copy link
Member

LGTM-

martin-gaievski added a commit to martin-gaievski/neural-search that referenced this pull request Mar 12, 2024
* Adding aggregations in hybrid query

Signed-off-by: Martin Gaievski <gaievski@amazon.com>
(cherry picked from commit f04c058)
martin-gaievski added a commit that referenced this pull request Mar 12, 2024
* Adding aggregations in hybrid query

Signed-off-by: Martin Gaievski <gaievski@amazon.com>
(cherry picked from commit f04c058)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Label will add auto workflow to backport PR to 2.x branch Features Introduces a new unit of functionality that satisfies a requirement v2.13.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants