Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: text chunking processor ingestion bug on multi-node cluster #713

Merged
merged 10 commits into from
May 1, 2024

Conversation

yuye-aws
Copy link
Member

@yuye-aws yuye-aws commented Apr 27, 2024

Description

For multi node cluster, the text chunking processor would produce "no such index" error if the configured shard number is less than the number of nodes. This is because some node does not contain the shard information. When we get max token count setting, indicesService fails to find the index information.

IndexService indexService = indicesService.indexServiceSafe(indexMetadata.getIndex());

Issues Resolved

Fix ingestion bug on multi-node cluster

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed as per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@yuye-aws
Copy link
Member Author

Hi maintainers. This PR is a fix towards text chunking processor. Please attach backport 2.x and backport 2.13 labels to this PR.

@yuye-aws yuye-aws marked this pull request as draft April 27, 2024 11:48
@yuye-aws
Copy link
Member Author

yuye-aws commented Apr 27, 2024

This PR is still work in progress. Before getting merged, this PR must satisfy the following conditions :

  • All integration test cases get passed with a three node cluster.

@yuye-aws
Copy link
Member Author

@model-collapse @zane-neo This PR is ready for review now. Please merge this PR after passing all the CI workflow.

@yuye-aws yuye-aws changed the title Fix: multi node text chunking processor index bug Fix: text chunking processor ingestion bug on multi-node cluster Apr 28, 2024
@zhichao-aws zhichao-aws added the backport 2.x Label will add auto workflow to backport PR to 2.x branch label Apr 28, 2024
vibrantvarun
vibrantvarun previously approved these changes Apr 28, 2024
@chishui
Copy link
Contributor

chishui commented Apr 28, 2024

Shall we add an IT to cover this "configured shard number is less than the number of nodes" scenario? Can be done in a separate issue and PR.

@zhichao-aws
Copy link
Member

Shall we add an IT to cover this "configured shard number is less than the number of nodes" scenario? Can be done in a separate issue and PR.

In current CI all IT are run with one node. I think we can enhance the CI framework by adding the build with -PnumNodes=3. This can help us exclude bugs in distributed scenerio at early stage.

Signed-off-by: yuye-aws <yuyezhu@amazon.com>
Signed-off-by: yuye-aws <yuyezhu@amazon.com>
Signed-off-by: yuye-aws <yuyezhu@amazon.com>
Signed-off-by: yuye-aws <yuyezhu@amazon.com>
Signed-off-by: yuye-aws <yuyezhu@amazon.com>
Signed-off-by: yuye-aws <yuyezhu@amazon.com>
Signed-off-by: yuye-aws <yuyezhu@amazon.com>
Signed-off-by: yuye-aws <yuyezhu@amazon.com>
Signed-off-by: yuye-aws <yuyezhu@amazon.com>
Copy link
Collaborator

@model-collapse model-collapse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All concerns addressed.

@vibrantvarun
Copy link
Member

We can't merge the PR until bwc tests passes.

@navneet1v
Copy link
Collaborator

navneet1v commented Apr 30, 2024

@model-collapse GH workflows are failing. Lets ensure GH actions are successful before approving the PRs

@vibrantvarun
Copy link
Member

Even gradle checks are failing @yuye-aws

@vibrantvarun
Copy link
Member

"Model not deployed yet" error coming from ml-commons opensearch-project/ml-commons#2382

Signed-off-by: yuye-aws <yuyezhu@amazon.com>
@zane-neo
Copy link
Collaborator

zane-neo commented May 1, 2024

"Model not deployed yet" error coming from ml-commons opensearch-project/ml-commons#2382

This is another issue that related to ml-commons main branch, we'll track this with a new issue. We'll merge this one for now as it fixes an critical issue that could impact on customers.

@zane-neo zane-neo merged commit 2d42408 into opensearch-project:main May 1, 2024
29 of 71 checks passed
opensearch-trigger-bot bot pushed a commit that referenced this pull request May 1, 2024
* fix multi node text chunking processor index bug

Signed-off-by: yuye-aws <yuyezhu@amazon.com>

* add change log

Signed-off-by: yuye-aws <yuyezhu@amazon.com>

* bug fix: no max token count setting in index

Signed-off-by: yuye-aws <yuyezhu@amazon.com>

* make program faster without creating index settings object

Signed-off-by: yuye-aws <yuyezhu@amazon.com>

* add comment

Signed-off-by: yuye-aws <yuyezhu@amazon.com>

* fix comment

Signed-off-by: yuye-aws <yuyezhu@amazon.com>

* resolve code review

Signed-off-by: yuye-aws <yuyezhu@amazon.com>

* simplify the code given toInt in NumberUtils

Signed-off-by: yuye-aws <yuyezhu@amazon.com>

* resolve code review comments

Signed-off-by: yuye-aws <yuyezhu@amazon.com>

---------

Signed-off-by: yuye-aws <yuyezhu@amazon.com>
(cherry picked from commit 2d42408)
@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.13 failed:

The process '/usr/bin/git' failed with exit code 1

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-2.13 2.13
# Navigate to the new working tree
cd .worktrees/backport-2.13
# Create a new branch
git switch --create backport/backport-713-to-2.13
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 2d42408c70e01b95825744bea0182ff361090a4e
# Push it to GitHub
git push --set-upstream origin backport/backport-713-to-2.13
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-2.13

Then, create a pull request where the base branch is 2.13 and the compare/head branch is backport/backport-713-to-2.13.

@zane-neo zane-neo added the backport 2.14 Backport PR to 2.14 branch label May 1, 2024
opensearch-trigger-bot bot pushed a commit that referenced this pull request May 1, 2024
* fix multi node text chunking processor index bug

Signed-off-by: yuye-aws <yuyezhu@amazon.com>

* add change log

Signed-off-by: yuye-aws <yuyezhu@amazon.com>

* bug fix: no max token count setting in index

Signed-off-by: yuye-aws <yuyezhu@amazon.com>

* make program faster without creating index settings object

Signed-off-by: yuye-aws <yuyezhu@amazon.com>

* add comment

Signed-off-by: yuye-aws <yuyezhu@amazon.com>

* fix comment

Signed-off-by: yuye-aws <yuyezhu@amazon.com>

* resolve code review

Signed-off-by: yuye-aws <yuyezhu@amazon.com>

* simplify the code given toInt in NumberUtils

Signed-off-by: yuye-aws <yuyezhu@amazon.com>

* resolve code review comments

Signed-off-by: yuye-aws <yuyezhu@amazon.com>

---------

Signed-off-by: yuye-aws <yuyezhu@amazon.com>
(cherry picked from commit 2d42408)
yuye-aws added a commit to yuye-aws/neural-search that referenced this pull request May 1, 2024
…nsearch-project#713)

* fix multi node text chunking processor index bug

Signed-off-by: yuye-aws <yuyezhu@amazon.com>

* add change log

Signed-off-by: yuye-aws <yuyezhu@amazon.com>

* bug fix: no max token count setting in index

Signed-off-by: yuye-aws <yuyezhu@amazon.com>

* make program faster without creating index settings object

Signed-off-by: yuye-aws <yuyezhu@amazon.com>

* add comment

Signed-off-by: yuye-aws <yuyezhu@amazon.com>

* fix comment

Signed-off-by: yuye-aws <yuyezhu@amazon.com>

* resolve code review

Signed-off-by: yuye-aws <yuyezhu@amazon.com>

* simplify the code given toInt in NumberUtils

Signed-off-by: yuye-aws <yuyezhu@amazon.com>

* resolve code review comments

Signed-off-by: yuye-aws <yuyezhu@amazon.com>

---------

Signed-off-by: yuye-aws <yuyezhu@amazon.com>
(cherry picked from commit 2d42408)
zane-neo pushed a commit that referenced this pull request May 1, 2024
… (#725)

* fix multi node text chunking processor index bug

Signed-off-by: yuye-aws <yuyezhu@amazon.com>

* add change log

Signed-off-by: yuye-aws <yuyezhu@amazon.com>

* bug fix: no max token count setting in index

Signed-off-by: yuye-aws <yuyezhu@amazon.com>

* make program faster without creating index settings object

Signed-off-by: yuye-aws <yuyezhu@amazon.com>

* add comment

Signed-off-by: yuye-aws <yuyezhu@amazon.com>

* fix comment

Signed-off-by: yuye-aws <yuyezhu@amazon.com>

* resolve code review

Signed-off-by: yuye-aws <yuyezhu@amazon.com>

* simplify the code given toInt in NumberUtils

Signed-off-by: yuye-aws <yuyezhu@amazon.com>

* resolve code review comments

Signed-off-by: yuye-aws <yuyezhu@amazon.com>

---------

Signed-off-by: yuye-aws <yuyezhu@amazon.com>
(cherry picked from commit 2d42408)

Signed-off-by: yuye-aws <yuyezhu@amazon.com>
zane-neo pushed a commit that referenced this pull request May 1, 2024
… (#724)

* fix multi node text chunking processor index bug

Signed-off-by: yuye-aws <yuyezhu@amazon.com>

* add change log

Signed-off-by: yuye-aws <yuyezhu@amazon.com>

* bug fix: no max token count setting in index

Signed-off-by: yuye-aws <yuyezhu@amazon.com>

* make program faster without creating index settings object

Signed-off-by: yuye-aws <yuyezhu@amazon.com>

* add comment

Signed-off-by: yuye-aws <yuyezhu@amazon.com>

* fix comment

Signed-off-by: yuye-aws <yuyezhu@amazon.com>

* resolve code review

Signed-off-by: yuye-aws <yuyezhu@amazon.com>

* simplify the code given toInt in NumberUtils

Signed-off-by: yuye-aws <yuyezhu@amazon.com>

* resolve code review comments

Signed-off-by: yuye-aws <yuyezhu@amazon.com>

---------

Signed-off-by: yuye-aws <yuyezhu@amazon.com>
(cherry picked from commit 2d42408)

Co-authored-by: yuye-aws <yuyezhu@amazon.com>
vibrantvarun pushed a commit that referenced this pull request May 1, 2024
… (#723)

* fix multi node text chunking processor index bug

Signed-off-by: yuye-aws <yuyezhu@amazon.com>

* add change log

Signed-off-by: yuye-aws <yuyezhu@amazon.com>

* bug fix: no max token count setting in index

Signed-off-by: yuye-aws <yuyezhu@amazon.com>

* make program faster without creating index settings object

Signed-off-by: yuye-aws <yuyezhu@amazon.com>

* add comment

Signed-off-by: yuye-aws <yuyezhu@amazon.com>

* fix comment

Signed-off-by: yuye-aws <yuyezhu@amazon.com>

* resolve code review

Signed-off-by: yuye-aws <yuyezhu@amazon.com>

* simplify the code given toInt in NumberUtils

Signed-off-by: yuye-aws <yuyezhu@amazon.com>

* resolve code review comments

Signed-off-by: yuye-aws <yuyezhu@amazon.com>

---------

Signed-off-by: yuye-aws <yuyezhu@amazon.com>
(cherry picked from commit 2d42408)

Co-authored-by: yuye-aws <yuyezhu@amazon.com>
@yuye-aws yuye-aws deleted the Fix/ChunkingMultiNode branch May 6, 2024 07:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Label will add auto workflow to backport PR to 2.x branch backport 2.13 backport 2.14 Backport PR to 2.14 branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants