Change text embedding processor to async mode for better isolation #27

zane-neo · 2022-10-24T04:38:37Z

Description

Changed text embedding processor to async mode when handling user input.
Previously since inferencing runs in async mode, and overriding only the execute(IngestDocument) method TextEmbeddingProcessor needs a blocking approach to make sure the document enrichment is complete and then the indexing happens. This has a drawback that the blocking happens in the write thread pool, this thread pool has only availableProcessors + 1 threads, and this could have impact on non text embedding indexing.
Changing this to async mode by overriding execute(IngestDocument, BiConsumer) method, thus threads in write thread pool will not be blocked and this has better isolation for non text embedding indexing.

Issues Resolved

An enhancement without issue resolve.

Check List

New functionality includes testing.
- All tests pass
New functionality has been documented.
- New functionality has javadoc added
Commits are signed as per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

jmazanec15 · 2022-10-24T17:30:52Z

@zane-neo Please add brief description to PR and/or issue this relates to.

navneet1v · 2022-10-25T02:04:36Z

@zane-neo can you please add, some more details like when client will be sending the bulk request what would be the flow? is there anything changing?

Its good that we are moving towards async just want to understand the flow.

navneet1v · 2022-10-25T02:05:44Z

@zane-neo please resolve the conflicts and also can we check why the checks are failing? as per the last commit all the checks were passing. You might want to do some rebasing.

navneet1v · 2022-10-24T19:18:01Z

src/main/java/org/opensearch/neuralsearch/processor/TextEmbeddingProcessor.java

+        return ingestDocument;
+    }
+
+    public void execute(IngestDocument ingestDocument, BiConsumer<IngestDocument, Exception> handler) {


Is this an override function?

Yes, this is an override function, added the override annotation also added proper java doc on this method.

src/main/java/org/opensearch/neuralsearch/processor/TextEmbeddingProcessor.java

navneet1v · 2022-10-25T02:09:13Z

src/main/java/org/opensearch/neuralsearch/processor/TextEmbeddingProcessor.java

+        } catch (Exception e) {
+            handler.accept(null, e);
        }


which function is throwing the exception which we are catching here?

underlying validateEmbeddingFieldsValue is throwing IllegalArgumentException.

if the exception is coming from this function only, add the exception around that function only and take out predict API call from the try block.

Checked again, not only the method validateEmbeddingFieldsValue is throwing exception but also the mlClient.predict, and the listener.onFailure is not invoked when exception arise, this exception could populate to execute(IngestDocument, BiConsumer), if not catching it, this will further populate to upper methods which could impact the pipeline execution, this is what we don't want. Also the parent method also use try catch to catch all the exceptions please refer to: https://github.com/opensearch-project/OpenSearch/blob/8c9ca4e858e6333265080972cf57809dbc086208/server/src/main/java/org/opensearch/ingest/Processor.java#L64.

Signed-off-by: Zan Niu <zaniu@amazon.com>

jmazanec15

Few minor comments, but overall it looks good.

jmazanec15 · 2022-10-25T18:10:25Z

src/main/java/org/opensearch/neuralsearch/processor/TextEmbeddingProcessor.java

+    }
+
+    /**
+     * When received a bulk indexing request, the pipeline will be executed in the <a href="https://github.com/opensearch-project/OpenSearch/blob/8fda187bb459757164cc80c91ca305d274ca2b53/server/src/main/java/org/opensearch/action/bulk/TransportBulkAction.java#L226">doInternalExecute</a> method


minor: Why this commit 8fda187bb459757164cc80c91ca305d274ca2b53 in the link?

Removed this.

jmazanec15 · 2022-10-25T18:11:28Z

src/main/java/org/opensearch/neuralsearch/processor/TextEmbeddingProcessor.java

-            throw new RuntimeException("Text embedding processor failed with exception", e);
+            validateEmbeddingFieldsValue(ingestDocument);
+            Map<String, Object> knnMap = buildMapWithKnnKeyAndOriginalValue(ingestDocument);
+            mlCommonsClientAccessor.inferenceSentences(this.modelId, createInferenceList(knnMap), ActionListener.wrap(x -> {


nit: can we keep name "vectors" instead of "x"

Sure, change to vectors

jmazanec15 · 2022-10-26T01:02:29Z

src/main/java/org/opensearch/neuralsearch/processor/TextEmbeddingProcessor.java

+     * @param ingestDocument
+     * @param handler
+     */
+    @Override


In the future, could we refactor this method to batch calls to the model so we can improve throughput?

@jmazanec15 what kind of batching you are suggesting here? Is it like batching the write or batching the inference calls?

It's doable, but two things we need to consider.

Effort, changing to bulk we need more effort and we need to consider a proper threshold either a batch number of linger time, and we need to consider if we need to expose these settings to user.

Benefit, in a cluster each two nodes have TCP connection keeping alive, there's no overhead and the network time consuming is relative low comparing with CPU consuming. Based on the performance testing, the bottle neck is the CPU resource instead of network IO. So changing to batch may not increase performance dramatically.

@zane-neo That makes sense. It might make sense to create a proof of concept test in future to see what the benefit would be.

@navneet1v the idea would be to batch the inference calls. For the async action, instead of calling inference directly, submit the update to some kind of queue that will then create 1 large request for multiple docs and call the handlers once it completes.

@zane-neo That makes sense. It might make sense to create a proof of concept test in future to see what the benefit would be.

@navneet1v the idea would be to batch the inference calls. For the async action, instead of calling inference directly, submit the update to some kind of queue that will then create 1 large request for multiple docs and call the handlers once it completes.

I am not like 100% convinced on the idea of batching and the reason behind that is batching the data and making 1 call, will help if the number of threads are less at the ML node level and multiple requests are competing for the threads(api call threads).
Secondly, for creating the batch we need to put the data in a sync queue, that will further slow down the processing.
Third, batches create problem where 1 single input sentence can delay all the documents processing.

But we can discuss further on this, but before even doing the POC, we should first do some deep-dive on the ML-Commons code to know if batching can really help or not.

navneet1v · 2022-10-26T05:43:50Z

src/main/java/org/opensearch/neuralsearch/processor/TextEmbeddingProcessor.java

+    /**
+     * When received a bulk indexing request, the pipeline will be executed in the <a href="https://github.com/opensearch-project/OpenSearch/blob/8fda187bb459757164cc80c91ca305d274ca2b53/server/src/main/java/org/opensearch/action/bulk/TransportBulkAction.java#L226">doInternalExecute</a> method
+     * Before the pipeline execution, the pipeline will be marked as resolved (means executed), and then this overriding method will be invoked when executing the text embedding processor.
+     * After the inference completes, the handler will invoke the doInternalExecute method again to run actual write operation.
+     * @param ingestDocument
+     * @param handler


Improve the documentation to explain what this function is doing not that what the whole pipeline is doing, and try to remove the commit ids from the java doc and point to right function using @link of java doc.

Sure, add the pipeline execution in the method body.

Signed-off-by: Zan Niu <zaniu@amazon.com>

jmazanec15

Failure https://github.com/opensearch-project/neural-search/actions/runs/3328176376/jobs/5504257477 is unrelated to this PR. It appears to be related to ml-commons permission issue. Approving.

jmazanec15 · 2022-10-26T16:13:57Z

src/main/java/org/opensearch/neuralsearch/processor/TextEmbeddingProcessor.java

+     * @param ingestDocument
+     * @param handler
+     */
+    @Override


@zane-neo That makes sense. It might make sense to create a proof of concept test in future to see what the benefit would be.

@navneet1v the idea would be to batch the inference calls. For the async action, instead of calling inference directly, submit the update to some kind of queue that will then create 1 large request for multiple docs and call the handlers once it completes.

* Change text embedding processor to async mode Signed-off-by: Zan Niu <zaniu@amazon.com> * Address review comments Signed-off-by: Zan Niu <zaniu@amazon.com> Signed-off-by: Zan Niu <zaniu@amazon.com> (cherry picked from commit d538ad1)

opensearch-trigger-bot · 2022-10-28T01:27:10Z

The backport to 2.x failed:

The process '/usr/bin/git' failed with exit code 1

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-2.x 2.x
# Navigate to the new working tree
cd .worktrees/backport-2.x
# Create a new branch
git switch --create backport/backport-27-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 d538ad14679216eb91095143d4237ce58376b790
# Push it to GitHub
git push --set-upstream origin backport/backport-27-to-2.x
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-2.x

Then, create a pull request where the base branch is 2.x and the compare/head branch is backport/backport-27-to-2.x.

opensearch-trigger-bot · 2022-10-29T00:06:50Z

The backport to 2.x failed:

The process '/usr/bin/git' failed with exit code 1

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-2.x 2.x
# Navigate to the new working tree
cd .worktrees/backport-2.x
# Create a new branch
git switch --create backport/backport-27-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 d538ad14679216eb91095143d4237ce58376b790
# Push it to GitHub
git push --set-upstream origin backport/backport-27-to-2.x
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-2.x

Then, create a pull request where the base branch is 2.x and the compare/head branch is backport/backport-27-to-2.x.

opensearch-trigger-bot · 2022-10-29T00:07:34Z

The backport to 2.x failed:

The process '/usr/bin/git' failed with exit code 1

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-2.x 2.x
# Navigate to the new working tree
cd .worktrees/backport-2.x
# Create a new branch
git switch --create backport/backport-27-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 d538ad14679216eb91095143d4237ce58376b790
# Push it to GitHub
git push --set-upstream origin backport/backport-27-to-2.x
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-2.x

Then, create a pull request where the base branch is 2.x and the compare/head branch is backport/backport-27-to-2.x.

opensearch-trigger-bot · 2022-10-29T00:26:48Z

The backport to 2.x failed:

The process '/usr/bin/git' failed with exit code 1

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-2.x 2.x
# Navigate to the new working tree
cd .worktrees/backport-2.x
# Create a new branch
git switch --create backport/backport-27-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 d538ad14679216eb91095143d4237ce58376b790
# Push it to GitHub
git push --set-upstream origin backport/backport-27-to-2.x
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-2.x

Then, create a pull request where the base branch is 2.x and the compare/head branch is backport/backport-27-to-2.x.

…pensearch-project#27) * Change text embedding processor to async mode Signed-off-by: Zan Niu <zaniu@amazon.com> Signed-off-by: Navneet Verma <navneev@amazon.com>

…) (#46) Signed-off-by: Zan Niu <zaniu@amazon.com> Signed-off-by: Navneet Verma <navneev@amazon.com>

zane-neo requested a review from a team October 24, 2022 04:38

zane-neo force-pushed the async-text-embedding branch from fbef895 to 9892a78 Compare October 24, 2022 04:49

zane-neo changed the title ~~Async text embedding~~ Change text embedding processor to async mode for better isolation Oct 24, 2022

zane-neo requested review from jmazanec15, navneet1v, model-collapse and ylwu-amzn October 24, 2022 11:30

navneet1v reviewed Oct 25, 2022

View reviewed changes

zane-neo force-pushed the async-text-embedding branch from 4cebd9b to 55447d9 Compare October 25, 2022 04:12

Change text embedding processor to async mode

b9c740f

Signed-off-by: Zan Niu <zaniu@amazon.com>

zane-neo force-pushed the async-text-embedding branch from 5c58236 to b9c740f Compare October 25, 2022 09:50

jmazanec15 reviewed Oct 25, 2022

View reviewed changes

jmazanec15 reviewed Oct 26, 2022

View reviewed changes

navneet1v reviewed Oct 26, 2022

View reviewed changes

Address review comments

bd52317

Signed-off-by: Zan Niu <zaniu@amazon.com>

jmazanec15 approved these changes Oct 26, 2022

View reviewed changes

navneet1v approved these changes Oct 26, 2022

View reviewed changes

zane-neo merged commit d538ad1 into opensearch-project:main Oct 27, 2022

zane-neo added the backport 2.x Label will add auto workflow to backport PR to 2.x branch label Oct 27, 2022

opensearch-trigger-bot bot mentioned this pull request Oct 27, 2022

[Backport 2.x] Change text embedding processor to async mode for better isolation #35

Closed

zane-neo added backport 2.x Label will add auto workflow to backport PR to 2.x branch and removed backport 2.x Label will add auto workflow to backport PR to 2.x branch labels Oct 28, 2022

jmazanec15 added backport 2.x Label will add auto workflow to backport PR to 2.x branch and removed backport 2.x Label will add auto workflow to backport PR to 2.x branch labels Oct 29, 2022

navneet1v added v2.4.0 backport 2.x Label will add auto workflow to backport PR to 2.x branch and removed backport 2.x Label will add auto workflow to backport PR to 2.x branch labels Oct 29, 2022

navneet1v added a commit that referenced this pull request Oct 29, 2022

Change text embedding processor to async mode for better isolation (#27…

f153ced

…) (#46) Signed-off-by: Zan Niu <zaniu@amazon.com> Signed-off-by: Navneet Verma <navneev@amazon.com>

jmazanec15 added the Enhancements Increases software capabilities beyond original client specifications label Nov 3, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change text embedding processor to async mode for better isolation #27

Change text embedding processor to async mode for better isolation #27

zane-neo commented Oct 24, 2022 •

edited

Loading

jmazanec15 commented Oct 24, 2022

navneet1v commented Oct 25, 2022

navneet1v commented Oct 25, 2022

navneet1v Oct 24, 2022

zane-neo Oct 25, 2022

navneet1v Oct 25, 2022

zane-neo Oct 25, 2022

navneet1v Oct 25, 2022

zane-neo Oct 26, 2022

jmazanec15 left a comment

jmazanec15 Oct 25, 2022

zane-neo Oct 26, 2022

jmazanec15 Oct 25, 2022

zane-neo Oct 26, 2022

jmazanec15 Oct 26, 2022

navneet1v Oct 26, 2022

zane-neo Oct 26, 2022 •

edited

Loading

jmazanec15 Oct 26, 2022

navneet1v Oct 26, 2022

navneet1v Oct 26, 2022

zane-neo Oct 26, 2022

jmazanec15 left a comment

jmazanec15 Oct 26, 2022

opensearch-trigger-bot bot commented Oct 28, 2022

opensearch-trigger-bot bot commented Oct 29, 2022

opensearch-trigger-bot bot commented Oct 29, 2022

opensearch-trigger-bot bot commented Oct 29, 2022

Change text embedding processor to async mode for better isolation #27

Change text embedding processor to async mode for better isolation #27

Conversation

zane-neo commented Oct 24, 2022 • edited Loading

Description

Issues Resolved

Check List

jmazanec15 commented Oct 24, 2022

navneet1v commented Oct 25, 2022

navneet1v commented Oct 25, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jmazanec15 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zane-neo Oct 26, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jmazanec15 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

opensearch-trigger-bot bot commented Oct 28, 2022

opensearch-trigger-bot bot commented Oct 29, 2022

opensearch-trigger-bot bot commented Oct 29, 2022

opensearch-trigger-bot bot commented Oct 29, 2022

zane-neo commented Oct 24, 2022 •

edited

Loading

zane-neo Oct 26, 2022 •

edited

Loading