-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Neural sparse query two-phase search processor's bwc test #777
Merged
zhichao-aws
merged 30 commits into
opensearch-project:main
from
conggguan:search-pipeline-bwc
Jul 9, 2024
Merged
Changes from 24 commits
Commits
Show all changes
30 commits
Select commit
Hold shift + click to select a range
3d2bd18
Poc of pipeline
conggguan b8ef828
Complete some settings for two phase pipeline.
conggguan f678e93
Change the implement of two-phase from QueryBuilderVistor to custom p…
conggguan 9a1d52c
Add It and fix some bug on the state of multy same neuralsparsequeryb…
conggguan 3bb10fe
Simplify some logic, and correct some format.
conggguan dbc4269
Optimize some format.
conggguan a93c8cd
Merge branch 'opensearch-project:main' into search-pipeline
conggguan f190834
Add some test case.
conggguan 5ee07d1
Optimize some logic for zhichao-aws's comments.
conggguan a9adb72
Merge branch 'main' into search-pipeline
conggguan 0f5eab9
Optimize a line without application.
conggguan 25edb27
Add some comments, remove some redundant lines, fix some format.
conggguan 61cac40
Remove a redundant null check, fix a if format.
conggguan 83abb31
Fix a typo for a comment, camelcase format for some variable.
conggguan a53966c
Add some comments to illustrate the influence of the modify on 2-phas…
conggguan eb17594
Add restart and rolling upgrade bwc test for neural sparse two phase …
conggguan 18e3e65
Merge branch 'opensearch-project:main' into search-pipeline-bwc
conggguan 248dfb4
Spotless on qa.
conggguan e362373
Update change log for two-phase BWC test.
conggguan 347e42e
Remove redundant lines of two-phase BWC test.
conggguan 641152f
Merge branch 'bwc-copy' into search-pipeline-bwc
conggguan 801d96a
Merge from main.
conggguan 55544cb
Add changelog.
conggguan c901832
Merge branch 'opensearch-project:main' into search-pipeline-bwc
conggguan 93e957f
Merge branch 'opensearch-project:main' into search-pipeline-bwc
conggguan b9cfdb6
Add the PR link and number for the CHANGELOG.md.
conggguan 363bd18
Merge branch 'opensearch-project:main' into search-pipeline-bwc
conggguan 820cbac
[Fix] NeuralSparseTwoPhaseProcessorIT created wrong ingest pipeline, …
conggguan aa42c07
Merge branch 'opensearch-project:main' into search-pipeline-bwc
conggguan 0540635
Merge branch 'main' into search-pipeline-bwc
conggguan File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
64 changes: 64 additions & 0 deletions
64
...pgrade/src/test/java/org/opensearch/neuralsearch/bwc/NeuralSparseTwoPhaseProcessorIT.java
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,64 @@ | ||
/* | ||
* Copyright OpenSearch Contributors | ||
* SPDX-License-Identifier: Apache-2.0 | ||
*/ | ||
package org.opensearch.neuralsearch.bwc; | ||
|
||
import org.opensearch.common.settings.Settings; | ||
import org.opensearch.neuralsearch.query.NeuralSparseQueryBuilder; | ||
import org.opensearch.neuralsearch.util.TestUtils; | ||
|
||
import java.nio.file.Files; | ||
import java.nio.file.Path; | ||
|
||
import static org.opensearch.neuralsearch.util.TestUtils.NODES_BWC_CLUSTER; | ||
import static org.opensearch.neuralsearch.util.TestUtils.TEXT_EMBEDDING_PROCESSOR; | ||
|
||
public class NeuralSparseTwoPhaseProcessorIT extends AbstractRestartUpgradeRestTestCase { | ||
|
||
private static final String NEURAL_SPARSE_INGEST_PIPELINE_NAME = "nstp-nlp-ingest-pipeline-dense"; | ||
private static final String NEURAL_SPARSE_TWO_PHASE_SEARCH_PIPELINE_NAME = "nstp-nlp-two-phase-search-pipeline-sparse"; | ||
private static final String TEST_ENCODING_FIELD = "passage_embedding"; | ||
private static final String TEST_TEXT_FIELD = "passage_text"; | ||
private static final String TEXT_1 = "Hello world a b"; | ||
|
||
public void testNeuralSparseQueryTwoPhaseProcessor_NeuralSearch_E2EFlow() throws Exception { | ||
waitForClusterHealthGreen(NODES_BWC_CLUSTER); | ||
NeuralSparseQueryBuilder neuralSparseQueryBuilder = new NeuralSparseQueryBuilder().fieldName(TEST_ENCODING_FIELD).queryText(TEXT_1); | ||
if (isRunningAgainstOldCluster()) { | ||
String modelId = uploadSparseEncodingModel(); | ||
loadModel(modelId); | ||
neuralSparseQueryBuilder.modelId(modelId); | ||
createPipelineProcessor(modelId, NEURAL_SPARSE_INGEST_PIPELINE_NAME); | ||
createIndexWithConfiguration( | ||
getIndexNameForTest(), | ||
Files.readString(Path.of(classLoader.getResource("processor/IndexMappingMultipleShard.json").toURI())), | ||
NEURAL_SPARSE_INGEST_PIPELINE_NAME | ||
); | ||
addDocument(getIndexNameForTest(), "0", TEST_TEXT_FIELD, TEXT_1, null, null); | ||
createNeuralSparseTwoPhaseSearchProcessor(NEURAL_SPARSE_TWO_PHASE_SEARCH_PIPELINE_NAME); | ||
updateIndexSettings( | ||
getIndexNameForTest(), | ||
Settings.builder().put("index.search.default_pipeline", NEURAL_SPARSE_TWO_PHASE_SEARCH_PIPELINE_NAME) | ||
); | ||
Object resultWith2PhasePipeline = search(getIndexNameForTest(), neuralSparseQueryBuilder, 1).get("hits"); | ||
assertNotNull(resultWith2PhasePipeline); | ||
} else { | ||
String modelId = null; | ||
try { | ||
modelId = TestUtils.getModelId(getIngestionPipeline(NEURAL_SPARSE_INGEST_PIPELINE_NAME), TEXT_EMBEDDING_PROCESSOR); | ||
loadModel(modelId); | ||
neuralSparseQueryBuilder.modelId(modelId); | ||
Object resultWith2PhasePipeline = search(getIndexNameForTest(), neuralSparseQueryBuilder, 1).get("hits"); | ||
assertNotNull(resultWith2PhasePipeline); | ||
} finally { | ||
wipeOfTestResources( | ||
getIndexNameForTest(), | ||
NEURAL_SPARSE_INGEST_PIPELINE_NAME, | ||
modelId, | ||
NEURAL_SPARSE_TWO_PHASE_SEARCH_PIPELINE_NAME | ||
); | ||
} | ||
} | ||
} | ||
} |
16 changes: 16 additions & 0 deletions
16
...tart-upgrade/src/test/resources/processor/NeuralSparseTwoPhaseProcessorConfiguration.json
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
{ | ||
"request_processors": [ | ||
{ | ||
"neural_sparse_two_phase_processor": { | ||
"tag": "neural-sparse", | ||
"description": "This processor is making two-phase rescorer.", | ||
"enabled": true, | ||
"two_phase_parameter": { | ||
"prune_ratio": %f, | ||
"expansion_rate": %f, | ||
"max_window_size": %d | ||
} | ||
} | ||
} | ||
] | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
78 changes: 78 additions & 0 deletions
78
...pgrade/src/test/java/org/opensearch/neuralsearch/bwc/NeuralSparseTwoPhaseProcessorIT.java
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,78 @@ | ||
/* | ||
* Copyright OpenSearch Contributors | ||
* SPDX-License-Identifier: Apache-2.0 | ||
*/ | ||
package org.opensearch.neuralsearch.bwc; | ||
|
||
import org.opensearch.common.settings.Settings; | ||
import org.opensearch.neuralsearch.query.NeuralSparseQueryBuilder; | ||
import org.opensearch.neuralsearch.util.TestUtils; | ||
|
||
import java.nio.file.Files; | ||
import java.nio.file.Path; | ||
import java.util.List; | ||
|
||
import static org.opensearch.neuralsearch.util.TestUtils.NODES_BWC_CLUSTER; | ||
import static org.opensearch.neuralsearch.util.TestUtils.SPARSE_ENCODING_PROCESSOR; | ||
|
||
public class NeuralSparseTwoPhaseProcessorIT extends AbstractRollingUpgradeTestCase { | ||
// add prefix to avoid conflicts with other IT class, since don't wipe resources after first round | ||
private static final String SPARSE_INGEST_PIPELINE_NAME = "nstp-nlp-ingest-pipeline-sparse"; | ||
private static final String SPARSE_SEARCH_TWO_PHASE_PIPELINE_NAME = "nstp-nlp-two-phase-search-pipeline-sparse"; | ||
private static final String TEST_ENCODING_FIELD = "passage_embedding"; | ||
private static final String TEST_TEXT_FIELD = "passage_text"; | ||
private static final String TEXT_1 = "Hello world a b"; | ||
private String sparseModelId = ""; | ||
|
||
// test of NeuralSparseTwoPhaseProcessor supports neural_sparse query's two phase speed up | ||
// the feature is introduced from 2.15 | ||
public void testNeuralSparseTwoPhaseProcessorIT_NeuralSparseSearch_E2EFlow() throws Exception { | ||
waitForClusterHealthGreen(NODES_BWC_CLUSTER); | ||
// will set the model_id after we obtain the id | ||
NeuralSparseQueryBuilder neuralSparseQueryBuilder = new NeuralSparseQueryBuilder().fieldName(TEST_ENCODING_FIELD).queryText(TEXT_1); | ||
|
||
switch (getClusterType()) { | ||
case OLD: | ||
sparseModelId = uploadSparseEncodingModel(); | ||
loadModel(sparseModelId); | ||
neuralSparseQueryBuilder.modelId(sparseModelId); | ||
createPipelineForSparseEncodingProcessor(sparseModelId, SPARSE_INGEST_PIPELINE_NAME); | ||
createIndexWithConfiguration( | ||
getIndexNameForTest(), | ||
Files.readString(Path.of(classLoader.getResource("processor/SparseIndexMappings.json").toURI())), | ||
SPARSE_INGEST_PIPELINE_NAME | ||
); | ||
addSparseEncodingDoc(getIndexNameForTest(), "0", List.of(), List.of(), List.of(TEST_TEXT_FIELD), List.of(TEXT_1)); | ||
createNeuralSparseTwoPhaseSearchProcessor(SPARSE_SEARCH_TWO_PHASE_PIPELINE_NAME); | ||
updateIndexSettings( | ||
getIndexNameForTest(), | ||
Settings.builder().put("index.search.default_pipeline", SPARSE_SEARCH_TWO_PHASE_PIPELINE_NAME) | ||
); | ||
assertNotNull(search(getIndexNameForTest(), neuralSparseQueryBuilder, 1).get("hits")); | ||
break; | ||
case MIXED: | ||
sparseModelId = TestUtils.getModelId(getIngestionPipeline(SPARSE_INGEST_PIPELINE_NAME), SPARSE_ENCODING_PROCESSOR); | ||
loadModel(sparseModelId); | ||
neuralSparseQueryBuilder.modelId(sparseModelId); | ||
assertNotNull(search(getIndexNameForTest(), neuralSparseQueryBuilder, 1).get("hits")); | ||
break; | ||
case UPGRADED: | ||
try { | ||
sparseModelId = TestUtils.getModelId(getIngestionPipeline(SPARSE_INGEST_PIPELINE_NAME), SPARSE_ENCODING_PROCESSOR); | ||
loadModel(sparseModelId); | ||
neuralSparseQueryBuilder.modelId(sparseModelId); | ||
assertNotNull(search(getIndexNameForTest(), neuralSparseQueryBuilder, 1).get("hits")); | ||
} finally { | ||
wipeOfTestResources( | ||
getIndexNameForTest(), | ||
SPARSE_INGEST_PIPELINE_NAME, | ||
sparseModelId, | ||
SPARSE_SEARCH_TWO_PHASE_PIPELINE_NAME | ||
); | ||
} | ||
break; | ||
default: | ||
throw new IllegalStateException("Unexpected value: " + getClusterType()); | ||
} | ||
} | ||
} |
16 changes: 16 additions & 0 deletions
16
...ling-upgrade/src/test/resources/processor/NeuralSparseTwoPhaseProcessorConfiguration.json
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
{ | ||
"request_processors": [ | ||
{ | ||
"neural_sparse_two_phase_processor": { | ||
"tag": "neural-sparse", | ||
"description": "This processor is making two-phase rescorer.", | ||
"enabled": true, | ||
"two_phase_parameter": { | ||
"prune_ratio": %f, | ||
"expansion_rate": %f, | ||
"max_window_size": %d | ||
} | ||
} | ||
} | ||
] | ||
} |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please make a proper changelog entry - pr number and link are missing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, have added it to the change log.