Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: There will be more than one task that state is started when bulk load when there is only one datanode #19563

Closed
1 task done
zhuwenxing opened this issue Sep 29, 2022 · 15 comments
Assignees
Labels
kind/bug Issues or changes related a bug stale indicates no udpates for 30 days triage/accepted Indicates an issue or PR is ready to be actively worked on.
Milestone

Comments

@zhuwenxing
Copy link
Contributor

Is there an existing issue for this?

  • I have searched the existing issues

Environment

- Milvus version: master-20220928-adeaac4f
- Deployment mode(standalone or cluster): cluster
- SDK version(e.g. pymilvus v2.0.0rc2): pymilvus==2.2.0.dev36
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

HjsHFhCb8D

Expected Behavior

No response

Steps To Reproduce

No response

Milvus Log

test-milvus-bulk-load-09-29-17-45 .zip

Anything else?

No response

@zhuwenxing zhuwenxing added kind/bug Issues or changes related a bug needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Sep 29, 2022
@zhuwenxing
Copy link
Contributor Author

/assign @soothing-rain
/unassign @yanliang567

@soothing-rain
Copy link
Contributor

Should be fixed by #19561

@soothing-rain
Copy link
Contributor

/unassign
/assign @zhuwenxing

@zhuwenxing
Copy link
Contributor Author

@zhuwenxing
Copy link
Contributor Author

/assign @soothing-rain

@soothing-rain
Copy link
Contributor

pymilvus==2.2.0.dev38 Milvus version: master-20221008-5b4038e9

failed job: https://qa-jenkins.milvus.io/blue/organizations/jenkins/bulk_load_test/detail/bulk_load_test/19/pipeline

image

log:

artifacts-bulk-load-test-19-pytest-logs (1).tar.gz

artifacts-bulk-load-test-19-server-logs (2).tar.gz

But there are two DataNode services right?

@zhuwenxing
Copy link
Contributor Author

zhuwenxing commented Oct 8, 2022

But there are two DataNode services right?

Yes, so it is as expected!

@zhuwenxing
Copy link
Contributor Author

/unassign @soothing-rain

@zhuwenxing zhuwenxing changed the title [Bug]: There will be more than one task that state is started when bulk load [Bug]: There will be more than one task that state is started when bulk load when there is only one datanode Oct 8, 2022
@zhuwenxing
Copy link
Contributor Author

Milvus version: master-20221008-7e620a7a

It still happened in the datanode pod killchaos test
log:
artifacts-bulk-load-test-6-server-logs.tar.gz
artifacts-bulk-load-test-6-pytest-logs.tar.gz

before chaos, it works well, there is only one working task when datanode number is 1.
image

during chaos, it becomes abnormal.
image

after chaos, the working tasks number is still wrong
image

@zhuwenxing
Copy link
Contributor Author

/assign @soothing-rain

@soothing-rain
Copy link
Contributor

OK this might make sense in chaos testing.

If a DataNode gets killed while working on a bulk load task, the state of that task will stay in STARTED and will eventually be removed when expired. In other words, these are "dead" bulk load tasks in STARTED states.

Also these "dead" tasks will not impact future bulk load tasks, as new tasks will be assigned to new DataNodes.

@zhuwenxing
Copy link
Contributor Author

Though they are "dead" STARTED tasks, it would be better to clean them, otherwise, it will make the user confused.

So I think it is still a bug that needs to fix, but the priority is not that high.

@zhuwenxing zhuwenxing added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Oct 9, 2022
@xiaofan-luan xiaofan-luan added this to the 2.2 milestone Oct 11, 2022
@zhuwenxing
Copy link
Contributor Author

/unassign

@soothing-rain
Copy link
Contributor

Though they are "dead" STARTED tasks, it would be better to clean them, otherwise, it will make the user confused.

So I think it is still a bug that needs to fix, but the priority is not that high.

Sure I created a new issue for this one:
#19777

We can close 19563 for now.

/assign @zhuwenxing

@stale
Copy link

stale bot commented Nov 13, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Rotten issues close after 30d of inactivity. Reopen the issue with /reopen.

@stale stale bot added the stale indicates no udpates for 30 days label Nov 13, 2022
@stale stale bot closed this as completed Nov 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Issues or changes related a bug stale indicates no udpates for 30 days triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests

4 participants