-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEATURE] Add rescoreContext for neuralsparse query's two-phase speed up. #695
Closed
+2,285
−31
Closed
Changes from 30 commits
Commits
Show all changes
37 commits
Select commit
Hold shift + click to select a range
746458d
Add rescoreContext for neuralsparse query's two-phase speed up.
55eeb31
enhancements: support neural sparse query by tokens
zhichao-aws 826eabe
add empty string logics for streams
zhichao-aws 7824216
fix fromXContent error case logic
zhichao-aws a8267bb
add ut
zhichao-aws 835929f
ut it
zhichao-aws fb771cd
updates for comments
zhichao-aws 3027c3b
add settings & parameters
zhichao-aws 3bea3ea
change int to float
zhichao-aws 3f9c076
hashcode & equals
zhichao-aws 03d24cc
nit
zhichao-aws c95ef12
merge query_by_tokens
zhichao-aws 485668f
Set windowSizeExpansion in twoPhase.
211d508
add ut
zhichao-aws 63340cb
Merge pull request #2 from zhichao-aws/rescore-zhichao-branch
conggguan d19132a
Add default boundary for neuralsparse twophase settings, and compleme…
607ff2d
Change some name of functiion to provide a clearer and more precise d…
0aa1a44
Add description of neural sparse two-phase enhancement in CHANGLOG.md.
17d7916
merge main branch & resolve conflicts (#3)
zhichao-aws 86a11dd
Fix conflict and optimize some function.
a511b0a
Fix the forbiddenApisMain check.
df0c9fc
Simplify a function name.
2f1183e
Add some java documentation.
0ac9245
Change the comment and logic of rewrite and visit.
1d28441
Return the original all token query if there has no low score token.
d37d73e
Merge remote-tracking branch 'origin' into rescore-context
aec35b6
Add a version check to disable two-phase.
6857ef2
Some update for comments.
31c349e
Merge branch 'main' into rescore-context
conggguan feb9606
Add some IT, optimized the get tokens logic in NeuralSparseQueryBuilder.
69c3b0d
Replace some NeuralSparseIT test case's model inference with fixed qu…
3a9da75
Merge remote-tracking branch 'origin' into rescore-context
37646cb
Refactor NeuralSparseTwoPhaseUtil.
cacbdab
Change the version check to current to pass bwc test.
0f089cb
Change the parameter's to private, change the way that two-phase supp…
57966f6
Use index setting for two phase.
5ede643
Merge branch 'opensearch-project:main' into rescore-context
conggguan File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
114 changes: 114 additions & 0 deletions
114
src/main/java/org/opensearch/neuralsearch/query/NeuralSparseQuery.java
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,114 @@ | ||
/* | ||
* Copyright OpenSearch Contributors | ||
* SPDX-License-Identifier: Apache-2.0 | ||
*/ | ||
package org.opensearch.neuralsearch.query; | ||
|
||
import java.io.IOException; | ||
import java.util.Objects; | ||
|
||
import lombok.AllArgsConstructor; | ||
import lombok.Getter; | ||
import lombok.NonNull; | ||
import org.apache.lucene.search.IndexSearcher; | ||
import org.apache.lucene.search.Query; | ||
import org.apache.lucene.search.QueryVisitor; | ||
import org.apache.lucene.search.ScoreMode; | ||
import org.apache.lucene.search.Weight; | ||
|
||
/** | ||
* Implementation of Query interface for type NeuralSparseQuery when TwoPhaseNeuralSparse Enabled. | ||
* Initialized, it currentQuery include all tokenQuery. After call setCurrentQueryToHighScoreTokenQuery, | ||
* it will perform highScoreTokenQuery. | ||
*/ | ||
@AllArgsConstructor | ||
@Getter | ||
@NonNull | ||
public final class NeuralSparseQuery extends Query { | ||
|
||
private Query currentQuery; | ||
private final Query highScoreTokenQuery; | ||
private final Query lowScoreTokenQuery; | ||
private final Float rescoreWindowSizeExpansion; | ||
|
||
/** | ||
* | ||
* @param field The field of this query. | ||
* @return String of NeuralSparseQuery object. | ||
*/ | ||
@Override | ||
public String toString(String field) { | ||
return "NeuralSparseQuery(" | ||
martin-gaievski marked this conversation as resolved.
Show resolved
Hide resolved
|
||
+ currentQuery.toString(field) | ||
+ "," | ||
+ highScoreTokenQuery.toString(field) | ||
+ ", " | ||
+ lowScoreTokenQuery.toString(field) | ||
+ ")"; | ||
|
||
} | ||
|
||
public Query rewrite(IndexSearcher indexSearcher) throws IOException { | ||
Query rewrittenCurrentQuery = currentQuery.rewrite(indexSearcher); | ||
Query rewrittenFirstStepQuery = highScoreTokenQuery.rewrite(indexSearcher); | ||
Query rewrittenSecondPhaseQuery = lowScoreTokenQuery.rewrite(indexSearcher); | ||
if (rewrittenCurrentQuery == currentQuery | ||
&& rewrittenFirstStepQuery == highScoreTokenQuery | ||
&& rewrittenSecondPhaseQuery == lowScoreTokenQuery) { | ||
return this; | ||
} | ||
return new NeuralSparseQuery(rewrittenCurrentQuery, rewrittenFirstStepQuery, rewrittenSecondPhaseQuery, rescoreWindowSizeExpansion); | ||
} | ||
|
||
/** | ||
* This interface is let the lucene to visit this query. | ||
* Briefly, the query to be performed always be a subset of current query. | ||
* So in this function use currentQuery.visit(). | ||
* @param queryVisitor a QueryVisitor to be called by each query in the tree | ||
*/ | ||
@Override | ||
public void visit(QueryVisitor queryVisitor) { | ||
currentQuery.visit(queryVisitor); | ||
} | ||
|
||
@Override | ||
public boolean equals(Object other) { | ||
return sameClassAs(other) && equalsTo(getClass().cast(other)); | ||
} | ||
|
||
private boolean equalsTo(NeuralSparseQuery other) { | ||
return Objects.equals(currentQuery, other.currentQuery) | ||
&& Objects.equals(highScoreTokenQuery, other.highScoreTokenQuery) | ||
&& Objects.equals(lowScoreTokenQuery, other.lowScoreTokenQuery) | ||
&& Objects.equals(rescoreWindowSizeExpansion, other.rescoreWindowSizeExpansion); | ||
} | ||
|
||
@Override | ||
public int hashCode() { | ||
int h = classHash(); | ||
h = 31 * h + Objects.hashCode(highScoreTokenQuery) + Objects.hashCode(lowScoreTokenQuery) + Objects.hashCode(currentQuery) + Objects | ||
.hashCode(rescoreWindowSizeExpansion); | ||
return h; | ||
} | ||
|
||
/** | ||
* This function is always performed after setCurrentQueryToHighScoreTokenQuery. And determine which query's weight to score docs. | ||
* @param searcher The searcher that execute the neural_sparse query. | ||
* @param scoreMode How the produced scorers will be consumed. | ||
* @param boost The boost that is propagated by the parent queries. | ||
* @return The weight of currentQuery. | ||
* @throws IOException If creteWeight failed. | ||
*/ | ||
@Override | ||
public Weight createWeight(IndexSearcher searcher, ScoreMode scoreMode, float boost) throws IOException { | ||
return currentQuery.createWeight(searcher, scoreMode, boost); | ||
} | ||
|
||
/** | ||
* Before call this function, the currentQuery of this object is a BooleanQuery include all tokens FeatureFiledQuery. | ||
* After call this function, the currentQuery of this object change to a BooleanQuery include high score tokens FeatureFiledQuery. | ||
*/ | ||
public void setCurrentQueryToHighScoreTokenQuery() { | ||
this.currentQuery = highScoreTokenQuery; | ||
} | ||
} |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
earlier NeuralSparseQuery was getting executed via RankFeature Query. Now we are changing to some other query clause? How we are ensuring that the changes are backward compatible?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For earlier node, it perform the original query and logic, for new node, it perform the updated logic and build a NeuralSpraseQuery.
Besides, we add version check, only enabled two-phase if it's on or after 2.14.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need to have BWC test for higher confidence, can you add it @conggguan? I think there was same ask from Varun in this PR