Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Minio is not ready even though it is running after Milvus installation #18409

Closed
1 task done
zhuwenxing opened this issue Jul 26, 2022 · 10 comments
Closed
1 task done
Assignees
Labels
kind/bug Issues or changes related a bug triage/accepted Indicates an issue or PR is ready to be actively worked on.
Milestone

Comments

@zhuwenxing
Copy link
Contributor

Is there an existing issue for this?

  • I have searched the existing issues

Environment

- Milvus version: master-20220725-128c66f3
- Deployment mode(standalone or cluster): cluster
- SDK version(e.g. pymilvus v2.0.0rc2): 2.1.0dev103
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

Flush takes 180s then timeout

14:41:12  [2022-07-26 06:38:05 - INFO - ci_test]: assert create collection: 0.05521559715270996, init_entities: 0 (test_e2e.py:24)
14:41:12  [2022-07-26 06:38:05 - DEBUG - ci_test]: (api_request)  : [Collection.insert] args: [[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76,......, kwargs: {'timeout': 20} (api_request.py:56)
14:41:12  [2022-07-26 06:38:05 - DEBUG - ci_test]: (api_response) : (insert count: 3000, delete count: 0, upsert count: 0, timestamp: 434849050913931265, success count: 3000, err count: 0)  (api_request.py:31)
14:41:12  [2022-07-26 06:38:05 - INFO - ci_test]: [test][2022-07-26T06:38:05Z] [0.31949790s] e2e__JuPfH2Zy insert -> (insert count: 3000, delete count: 0, upsert count: 0, timestamp: 434849050913931265, success count: 3000, err count: 0) (wrapper.py:30)
14:41:12  [2022-07-26 06:38:05 - INFO - ci_test]: assert insert: 0.31969380378723145 (test_e2e.py:31)
14:41:12  [2022-07-26 06:41:05 - ERROR - pymilvus.decorators]: Unexcepted error: [flush], , <Time: {'RPC start': '2022-07-26 06:38:05.595908', 'Exception': '2022-07-26 06:41:05.596010'}> (decorators.py:107)
14:41:12  [2022-07-26 06:41:05 - ERROR - ci_test]: flush timeout error:  (collection_wrapper.py:136)
14:41:12  [2022-07-26 06:41:05 - INFO - ci_test]: [test][2022-07-26T06:38:05Z] [180.00088442s] e2e__JuPfH2Zy flush -> None (wrapper.py:30)
14:41:12  ------------- generated html file: file:///tmp/ci_logs/report.html -------------
14:41:12  =========================== short test summary info ============================
14:41:12  FAILED ../testcases/test_e2e.py::TestE2e::test_milvus_default - assert False
14:41:12  ======================== 1 failed in 180.58s (0:03:00) =========================

Expected Behavior

all test cases passed

Steps To Reproduce

see https://qa-jenkins.zilliz.cc/job/chaos-test/194/

It failed before chaos

Milvus Log

artifacts-pulsar-pod-kill-194-pytest-logs.tar.gz

artifacts-pulsar-pod-kill-194-server-logs (1).tar.gz

Anything else?

It failed before chaos, so the Milvus is a fresh one

@zhuwenxing zhuwenxing added kind/bug Issues or changes related a bug needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jul 26, 2022
@xiaofan-luan
Copy link
Collaborator

/assign @wayblink pls help on investigate

@xiaofan-luan xiaofan-luan assigned wayblink and unassigned yanliang567 Jul 26, 2022
@xiaofan-luan xiaofan-luan added this to the 2.1 GA milestone Jul 26, 2022
@wayblink
Copy link
Contributor

wayblink commented Aug 8, 2022

[2022/07/26 06:37:33.993 +00:00] [WARN] [roles/healthz_handler.go:100] ["component is unhealthy"] [state=Abnormal]
[2022/07/26 06:37:33.994 +00:00] [DEBUG] [datanode/data_node.go:523] ["DataNode current state"] [State=Abnormal]
[2022/07/26 06:37:33.994 +00:00] [WARN] [roles/healthz_handler.go:100] ["component is unhealthy"] [state=Abnormal]
[2022/07/26 06:37:35.702 +00:00] [WARN] [client/client.go:103] ["RootCoordClient mess key not exist"] [key=rootcoord]
[2022/07/26 06:37:35.702 +00:00] [ERROR] [grpcclient/client.go:140] ["failed to get client address"] [error="find no available rootcoord, check rootcoord state"]

datanode state is abnormal because rootcoord is initializing. No relationship with flush. This issue only happened once, will see if it occurs in the future.

@zhuwenxing
Copy link
Contributor Author

@wayblink please help to take a look, it happened again

@wayblink
Copy link
Contributor

@wayblink please help to take a look, it happened again

get it

@wayblink
Copy link
Contributor

Key message:
[2022/08/22 13:34:10.548 +00:00] [WARN] [storage/minio_chunk_manager.go:152] ["failed to put object"] [path=file/insert_log/435467123048516929/435467123048516930/435467123166019586/103/435467123834486790] [error="Resource requested is unreadable, please reduce your request rate"]

milvus keep trying to flush data to minio but return error "Resource requested is unreadable, please reduce your request rate".

@zhuwenxing
Copy link
Contributor Author

@LoveEachDay

please help to take a look, it seems that is an issue about minio

@LoveEachDay
Copy link
Contributor

@zhuwenxing From the above log:

Error: Write failed. Insufficient number of disks online 

minio does not have a quorum to accept any writes. And by default, we does not enable health probe for minio pod. So we've added health probe for minio in milvus-helm chart of version 3.18.

@zhuwenxing zhuwenxing added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Sep 6, 2022
@zhuwenxing zhuwenxing assigned LoveEachDay and unassigned wayblink Sep 6, 2022
@zhuwenxing zhuwenxing changed the title [Bug]: Flush hangs when running test_e2e.py for a fresh Milvus [Bug]: Minio is not ready even though it is running after Milvus installation Sep 21, 2022
@zhuwenxing
Copy link
Contributor Author

Not reproduced anymore, so close it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Issues or changes related a bug triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests

5 participants