[Bug]: Load collection timeout after Milvus recovered from querycoord pod kill chaos test #35087
Open
Description
Is there an existing issue for this?
- I have searched the existing issues
Environment
- Milvus version:2.4-20240729-e2fc8853-amd64
- Deployment mode(standalone or cluster):cluster
- MQ type(rocksmq, pulsar or kafka):
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS):
- CPU/Memory:
- GPU:
- Others:
Current Behavior
[2024-07-29T11:30:09.033Z] <name>: Checker__39o1EG8v
[2024-07-29T11:30:09.033Z] <description>:
[2024-07-29T11:30:09.033Z] <schema>: {'auto_id': False, 'description': '', 'fields': [{'name': 'int64', 'description': '', 'type': <DataType.INT64: 5>, 'is_primary': True, 'auto_id': False}, {'name': 'float', 'description': '', 'type': <DataType.FLOAT: 10>}...... (api_request.py:37)
[2024-07-29T11:30:09.033Z] [2024-07-29 11:25:24 - DEBUG - ci_test]: (api_request) : [Collection.flush] args: [], kwargs: {'timeout': 180} (api_request.py:62)
[2024-07-29T11:30:09.033Z] [2024-07-29 11:25:25 - DEBUG - ci_test]: (api_response) : None (api_request.py:37)
[2024-07-29T11:30:09.033Z] [2024-07-29 11:25:25 - DEBUG - ci_test]: (api_request) : [Collection.compact] args: [180], kwargs: {} (api_request.py:62)
[2024-07-29T11:30:09.033Z] [2024-07-29 11:25:25 - DEBUG - ci_test]: (api_response) : None (api_request.py:37)
[2024-07-29T11:30:09.033Z] [2024-07-29 11:26:02 - DEBUG - ci_test]: (api_request) : [Collection.flush] args: [], kwargs: {'timeout': 180} (api_request.py:62)
[2024-07-29T11:30:09.033Z] [2024-07-29 11:26:03 - DEBUG - ci_test]: (api_response) : None (api_request.py:37)
[2024-07-29T11:30:09.033Z] [2024-07-29 11:26:03 - INFO - ci_test]: assert create collection: 0.015234708786010742, init_entities: 162000 (test_all_collections_after_chaos.py:49)
[2024-07-29T11:30:09.033Z] [2024-07-29 11:26:07 - DEBUG - ci_test]: (api_request) : [Collection.insert] args: [[[-3000, -2999, -2998, -2997, -2996, -2995, -2994, -2993, -2992, -2991, -2990, -2989, -2988, -2987, -2986, -2985, -2984, -2983, -2982, -2981, -2980, -2979, -2978, -2977, -2976, -2975, -2974, -2973, -2972, -2971, -2970, -2969, -2968, -2967, -2966, -2965, -2964, -2963, -2962, -2961, -2960, -2959, -29......, kwargs: {'timeout': 180} (api_request.py:62)
[2024-07-29T11:30:09.033Z] [2024-07-29 11:26:08 - DEBUG - ci_test]: (api_response) : (insert count: 2000, delete count: 0, upsert count: 0, timestamp: 451478124849266690, success count: 2000, err count: 0, cost: 0) (api_request.py:37)
[2024-07-29T11:30:09.033Z] [2024-07-29 11:26:08 - INFO - ci_test]: assert insert: 2.2203476428985596 (test_all_collections_after_chaos.py:57)
[2024-07-29T11:30:09.033Z] [2024-07-29 11:26:08 - DEBUG - ci_test]: (api_request) : [Collection.flush] args: [], kwargs: {'timeout': 180} (api_request.py:62)
[2024-07-29T11:30:09.033Z] [2024-07-29 11:26:11 - DEBUG - ci_test]: (api_response) : None (api_request.py:37)
[2024-07-29T11:30:09.033Z] [2024-07-29 11:26:11 - DEBUG - ci_test]: (api_request) : [Collection.flush] args: [], kwargs: {'timeout': 180} (api_request.py:62)
[2024-07-29T11:30:09.033Z] [2024-07-29 11:26:12 - DEBUG - ci_test]: (api_response) : None (api_request.py:37)
[2024-07-29T11:30:09.033Z] [2024-07-29 11:26:12 - INFO - ci_test]: assert flush: 3.0466232299804688, entities: 164000 (test_all_collections_after_chaos.py:67)
[2024-07-29T11:30:09.033Z] [2024-07-29 11:26:12 - INFO - ci_test]: index info: [{'collection': 'Checker__39o1EG8v', 'field': 'image_emb', 'index_name': 'image_emb', 'index_param': {'index_type': 'HNSW', 'metric_type': 'L2', 'params': {'M': 48, 'efConstruction': 500}}}, {'collection': 'Checker__39o1EG8v', 'field': 'text_emb', 'index_name': 'text_emb', 'index_param': {'index_type': 'HNSW', 'metric_type': 'L2', 'params': {'M': 48, 'efConstruction': 500}}}, {'collection': 'Checker__39o1EG8v', 'field': 'voice_emb', 'index_name': 'voice_emb', 'index_param': {'index_type': 'HNSW', 'metric_type': 'L2', 'params': {'M': 48, 'efConstruction': 500}}}, {'collection': 'Checker__39o1EG8v', 'field': 'int64', 'index_name': 'int64', 'index_param': {'index_type': 'INVERTED'}}, {'collection': 'Checker__39o1EG8v', 'field': 'float', 'index_name': 'float', 'index_param': {'index_type': 'INVERTED'}}, {'collection': 'Checker__39o1EG8v', 'field': 'varchar', 'index_name': 'varchar', 'index_param': {'index_type': 'INVERTED'}}, {'collection': 'Checker__39o1EG8v', 'field': 'float_vector', 'index_name': 'float_vector', 'index_param': {'index_type': 'HNSW', 'metric_type': 'L2', 'params': {'M': 48, 'efConstruction': 500}}}] (test_all_collections_after_chaos.py:71)
[2024-07-29T11:30:09.034Z] [2024-07-29 11:26:12 - INFO - ci_test]: index info: [{'collection': 'Checker__39o1EG8v', 'field': 'voice_emb', 'index_name': 'voice_emb', 'index_param': {'index_type': 'HNSW', 'metric_type': 'L2', 'params': {'M': 48, 'efConstruction': 500}}}, {'collection': 'Checker__39o1EG8v', 'field': 'int64', 'index_name': 'int64', 'index_param': {'index_type': 'INVERTED'}}, {'collection': 'Checker__39o1EG8v', 'field': 'float', 'index_name': 'float', 'index_param': {'index_type': 'INVERTED'}}, {'collection': 'Checker__39o1EG8v', 'field': 'varchar', 'index_name': 'varchar', 'index_param': {'index_type': 'INVERTED'}}, {'collection': 'Checker__39o1EG8v', 'field': 'float_vector', 'index_name': 'float_vector', 'index_param': {'index_type': 'HNSW', 'metric_type': 'L2', 'params': {'M': 48, 'efConstruction': 500}}}, {'collection': 'Checker__39o1EG8v', 'field': 'image_emb', 'index_name': 'image_emb', 'index_param': {'index_type': 'HNSW', 'metric_type': 'L2', 'params': {'M': 48, 'efConstruction': 500}}}, {'collection': 'Checker__39o1EG8v', 'field': 'text_emb', 'index_name': 'text_emb', 'index_param': {'index_type': 'HNSW', 'metric_type': 'L2', 'params': {'M': 48, 'efConstruction': 500}}}] (test_all_collections_after_chaos.py:86)
[2024-07-29T11:30:09.034Z] [2024-07-29 11:26:12 - DEBUG - ci_test]: (api_request) : [Collection.load] args: [None, 1, 180], kwargs: {} (api_request.py:62)
[2024-07-29T11:30:09.034Z] [2024-07-29 11:29:12 - ERROR - pymilvus.decorators]: RPC error: [wait_for_loading_collection], <MilvusException: (code=1, message=wait for loading collection timeout, collection: Checker__39o1EG8v)>, <Time:{'RPC start': '2024-07-29 11:26:12.292044', 'RPC error': '2024-07-29 11:29:12.295765'}> (decorators.py:146)
[2024-07-29T11:30:09.034Z] [2024-07-29 11:29:12 - WARNING - pymilvus.decorators]: Retry timeout: 180s (decorators.py:106)
[2024-07-29T11:30:09.034Z] [2024-07-29 11:29:12 - ERROR - pymilvus.decorators]: RPC error: [load_collection], <MilvusException: (code=1, message=Retry timeout: 180s, message=wait for loading collection timeout, collection: Checker__39o1EG8v)>, <Time:{'RPC start': '2024-07-29 11:26:12.262751', 'RPC error': '2024-07-29 11:29:12.296696'}> (decorators.py:146)
[2024-07-29T11:30:09.034Z] [2024-07-29 11:29:12 - ERROR - ci_test]: Traceback (most recent call last):
[2024-07-29T11:30:09.034Z] File "/home/jenkins/agent/workspace/tests/python_client/utils/api_request.py", line 32, in inner_wrapper
[2024-07-29T11:30:09.034Z] res = func(*args, **_kwargs)
[2024-07-29T11:30:09.034Z] File "/home/jenkins/agent/workspace/tests/python_client/utils/api_request.py", line 63, in api_request
[2024-07-29T11:30:09.034Z] return func(*arg, **kwargs)
[2024-07-29T11:30:09.034Z] File "/usr/local/lib/python3.8/dist-packages/pymilvus/orm/collection.py", line 424, in load
[2024-07-29T11:30:09.034Z] conn.load_collection(
[2024-07-29T11:30:09.034Z] File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 147, in handler
[2024-07-29T11:30:09.034Z] raise e from e
[2024-07-29T11:30:09.034Z] File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 143, in handler
[2024-07-29T11:30:09.034Z] return func(*args, **kwargs)
[2024-07-29T11:30:09.034Z] File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 182, in handler
[2024-07-29T11:30:09.034Z] return func(self, *args, **kwargs)
[2024-07-29T11:30:09.034Z] File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 107, in handler
[2024-07-29T11:30:09.034Z] raise MilvusException(
[2024-07-29T11:30:09.034Z] pymilvus.exceptions.MilvusException: <MilvusException: (code=1, message=Retry timeout: 180s, message=wait for loading collection timeout, collection: Checker__39o1EG8v)>
[2024-07-29T11:30:09.034Z] (api_request.py:45)
[2024-07-29T11:30:09.034Z] [2024-07-29 11:29:12 - ERROR - ci_test]: (api_response) : <MilvusException: (code=1, message=Retry timeout: 180s, message=wait for loading collection timeout, collection: Checker__39o1EG8v)> (api_request.py:46)
Expected Behavior
No response
Steps To Reproduce
No response
Milvus Log
log:
artifacts-querycoord-pod-kill-14944-server-logs.tar.gz
cluster:4am
ns:chaos-testing
pod info:
+ kubectl get pods -o wide
+ grep querycoord-pod-kill-14944
querycoord-pod-kill-14944-etcd-0 1/1 Running 0 36m 10.104.24.27 4am-node29 <none> <none>
querycoord-pod-kill-14944-etcd-1 1/1 Running 0 36m 10.104.26.125 4am-node32 <none> <none>
querycoord-pod-kill-14944-etcd-2 1/1 Running 0 36m 10.104.18.17 4am-node25 <none> <none>
querycoord-pod-kill-14944-milvus-datacoord-7bfd858964-sj79n 1/1 Running 4 36m 10.104.6.195 4am-node13 <none> <none>
querycoord-pod-kill-14944-milvus-datanode-77679bcc66-fn6cf 1/1 Running 4 (35m ago) 36m 10.104.32.231 4am-node39 <none> <none>
querycoord-pod-kill-14944-milvus-datanode-77679bcc66-hpfkg 1/1 Running 4 (34m ago) 36m 10.104.30.108 4am-node38 <none> <none>
querycoord-pod-kill-14944-milvus-indexcoord-67bf65c7f9-2cxtp 1/1 Running 0 36m 10.104.6.197 4am-node13 <none> <none>
querycoord-pod-kill-14944-milvus-indexnode-5ff5467655-66l9f 1/1 Running 3 (35m ago) 36m 10.104.33.98 4am-node36 <none> <none>
querycoord-pod-kill-14944-milvus-indexnode-5ff5467655-7l5mn 1/1 Running 3 (35m ago) 36m 10.104.23.179 4am-node27 <none> <none>
querycoord-pod-kill-14944-milvus-indexnode-5ff5467655-czgxl 1/1 Running 4 (35m ago) 36m 10.104.6.196 4am-node13 <none> <none>
querycoord-pod-kill-14944-milvus-proxy-7c4fdd6cf5-psmjc 1/1 Running 3 (35m ago) 36m 10.104.4.32 4am-node11 <none> <none>
querycoord-pod-kill-14944-milvus-querycoord-6bbd7d6bd8-q6gdq 1/1 Running 0 14m 10.104.17.108 4am-node23 <none> <none>
querycoord-pod-kill-14944-milvus-querynode-686cfb56d9-8bh9g 1/1 Running 3 (35m ago) 36m 10.104.13.9 4am-node16 <none> <none>
querycoord-pod-kill-14944-milvus-querynode-686cfb56d9-d2q2f 1/1 Running 4 (35m ago) 36m 10.104.16.164 4am-node21 <none> <none>
querycoord-pod-kill-14944-milvus-querynode-686cfb56d9-pfdlp 1/1 Running 4 (35m ago) 36m 10.104.34.170 4am-node37 <none> <none>
querycoord-pod-kill-14944-milvus-rootcoord-8d8b55f7b-hbhmm 1/1 Running 3 (35m ago) 36m 10.104.6.193 4am-node13 <none> <none>
querycoord-pod-kill-14944-minio-0 1/1 Running 0 36m 10.104.26.121 4am-node32 <none> <none>
querycoord-pod-kill-14944-minio-1 1/1 Running 0 36m 10.104.20.13 4am-node22 <none> <none>
querycoord-pod-kill-14944-minio-2 1/1 Running 0 36m 10.104.24.28 4am-node29 <none> <none>
querycoord-pod-kill-14944-minio-3 1/1 Running 0 36m 10.104.18.16 4am-node25 <none> <none>
querycoord-pod-kill-14944-pulsar-bookie-0 1/1 Running 0 36m 10.104.20.12 4am-node22 <none> <none>
querycoord-pod-kill-14944-pulsar-bookie-1 1/1 Running 0 36m 10.104.26.126 4am-node32 <none> <none>
querycoord-pod-kill-14944-pulsar-bookie-init-wl4wf 0/1 Completed 0 36m 10.104.32.229 4am-node39 <none> <none>
querycoord-pod-kill-14944-pulsar-broker-0 1/1 Running 0 36m 10.104.24.22 4am-node29 <none> <none>
querycoord-pod-kill-14944-pulsar-proxy-0 1/1 Running 0 36m 10.104.26.115 4am-node32 <none> <none>
querycoord-pod-kill-14944-pulsar-pulsar-init-gmjgr 0/1 Completed 0 36m 10.104.32.232 4am-node39 <none> <none>
querycoord-pod-kill-14944-pulsar-recovery-0 1/1 Running 0 36m 10.104.32.230 4am-node39 <none> <none>
querycoord-pod-kill-14944-pulsar-zookeeper-0 1/1 Running 0 36m 10.104.26.122 4am-node32 <none> <none>
Anything else?
No response