[Bug]: [benchmark][cluster] flush raises error failed to call flush to data coordinator: channel not found
in concurrent ddl & dql scene #39588
Open
Description
Is there an existing issue for this?
- I have searched the existing issues
Environment
- Milvus version:2.5-20250124-758ac5a4-amd64
- Deployment mode(standalone or cluster):cluster
- MQ type(rocksmq, pulsar or kafka):pulsar
- SDK version(e.g. pymilvus v2.0.0rc2):2.5.0rc124
- OS(Ubuntu or CentOS):
- CPU/Memory:
- GPU:
- Others:
Current Behavior
argo task: fouramf-67q8g
server:
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
envoy-spring-festival-2-65b8bb48f6-9prjg 1/1 Running 0 2d12h 10.104.13.194 4am-node16 <none> <none>
spring-festival-2-etcd-0 1/1 Running 0 11h 10.104.19.40 4am-node28 <none> <none>
spring-festival-2-etcd-1 1/1 Running 0 11h 10.104.15.162 4am-node20 <none> <none>
spring-festival-2-etcd-2 1/1 Running 0 11h 10.104.27.63 4am-node31 <none> <none>
spring-festival-2-milvus-datanode-55cb855995-5b9pj 1/1 Running 2 (11h ago) 11h 10.104.13.143 4am-node16 <none> <none>
spring-festival-2-milvus-datanode-55cb855995-d25q4 1/1 Running 2 (11h ago) 11h 10.104.23.78 4am-node27 <none> <none>
spring-festival-2-milvus-indexnode-7c94447dcf-5qv74 1/1 Running 2 (11h ago) 11h 10.104.14.151 4am-node18 <none> <none>
spring-festival-2-milvus-indexnode-7c94447dcf-68mqm 1/1 Running 2 (11h ago) 11h 10.104.9.156 4am-node14 <none> <none>
spring-festival-2-milvus-indexnode-7c94447dcf-7d5cz 1/1 Running 2 (11h ago) 11h 10.104.34.131 4am-node37 <none> <none>
spring-festival-2-milvus-indexnode-7c94447dcf-lwtxw 1/1 Running 2 (11h ago) 11h 10.104.21.165 4am-node24 <none> <none>
spring-festival-2-milvus-mixcoord-8486958b8f-967zd 1/1 Running 2 (11h ago) 11h 10.104.34.130 4am-node37 <none> <none>
spring-festival-2-milvus-proxy-996ddb597-dvxgh 1/1 Running 2 (11h ago) 11h 10.104.34.132 4am-node37 <none> <none>
spring-festival-2-milvus-proxy-996ddb597-lh59d 1/1 Running 2 (11h ago) 11h 10.104.23.79 4am-node27 <none> <none>
spring-festival-2-milvus-proxy-996ddb597-n8scb 1/1 Running 2 (11h ago) 11h 10.104.21.167 4am-node24 <none> <none>
spring-festival-2-milvus-querynode-7d4c895bf-gztl8 1/1 Running 2 (11h ago) 11h 10.104.23.80 4am-node27 <none> <none>
spring-festival-2-milvus-querynode-7d4c895bf-s7cst 1/1 Running 2 (11h ago) 11h 10.104.30.227 4am-node38 <none> <none>
spring-festival-2-milvus-querynode-7d4c895bf-tkh5f 1/1 Running 4 (11h ago) 11h 10.104.32.25 4am-node39 <none> <none>
spring-festival-2-minio-0 1/1 Running 0 11h 10.104.15.160 4am-node20 <none> <none>
spring-festival-2-minio-1 1/1 Running 0 11h 10.104.27.56 4am-node31 <none> <none>
spring-festival-2-minio-2 1/1 Running 0 11h 10.104.19.41 4am-node28 <none> <none>
spring-festival-2-minio-3 1/1 Running 0 11h 10.104.26.127 4am-node32 <none> <none>
spring-festival-2-pulsarv3-bookie-0 1/1 Running 0 11h 10.104.15.161 4am-node20 <none> <none>
spring-festival-2-pulsarv3-bookie-1 1/1 Running 0 11h 10.104.19.42 4am-node28 <none> <none>
spring-festival-2-pulsarv3-bookie-2 1/1 Running 0 11h 10.104.27.62 4am-node31 <none> <none>
spring-festival-2-pulsarv3-bookie-init-dbsdw 0/1 Completed 0 11h 10.104.21.164 4am-node24 <none> <none>
spring-festival-2-pulsarv3-broker-0 1/1 Running 0 11h 10.104.13.147 4am-node16 <none> <none>
spring-festival-2-pulsarv3-broker-1 1/1 Running 0 11h 10.104.15.153 4am-node20 <none> <none>
spring-festival-2-pulsarv3-proxy-0 1/1 Running 0 11h 10.104.15.152 4am-node20 <none> <none>
spring-festival-2-pulsarv3-proxy-1 1/1 Running 0 11h 10.104.19.35 4am-node28 <none> <none>
spring-festival-2-pulsarv3-pulsar-init-2lf8j 0/1 Completed 0 11h 10.104.13.144 4am-node16 <none> <none>
spring-festival-2-pulsarv3-recovery-0 1/1 Running 0 11h 10.104.21.166 4am-node24 <none> <none>
spring-festival-2-pulsarv3-zookeeper-0 1/1 Running 0 11h 10.104.15.159 4am-node20 <none> <none>
spring-festival-2-pulsarv3-zookeeper-1 1/1 Running 0 11h 10.104.27.61 4am-node31 <none> <none>
spring-festival-2-pulsarv3-zookeeper-2 1/1 Running 0 11h 10.104.19.44 4am-node28 <none> <none>
trace_id_642b50fecf9d4fa46bb165697c88de5c.log
client logs:
[2025-01-24 14:18:47,743 - ERROR - fouram]: RPC error: [flush], <MilvusException: (code=65535, message=failed to call flush to data coordinator: channel not found[channel=by-dev-rootcoord-dml_1_455526670483048983v3])>, <Time:{'RPC start': '2025-01-24 14:17:08.499424', 'RPC error': '2025-01-24 14:18:47.743672'}> (decorators.py:140)
[2025-01-24 14:24:05,773 - ERROR - fouram]: RPC error: [flush], <MilvusException: (code=65535, message=failed to call flush to data coordinator: channel not found[channel=by-dev-rootcoord-dml_1_455526670484931669v1])>, <Time:{'RPC start': '2025-01-24 14:22:26.383758', 'RPC error': '2025-01-24 14:24:05.773857'}> (decorators.py:140)
[2025-01-24 14:29:22,958 - ERROR - fouram]: RPC error: [flush], <MilvusException: (code=65535, message=failed to call flush to data coordinator: channel not found[channel=by-dev-rootcoord-dml_1_455526670486572603v1])>, <Time:{'RPC start': '2025-01-24 14:27:43.438321', 'RPC error': '2025-01-24 14:29:22.958888'}> (decorators.py:140)
[2025-01-24 14:31:14,127 - ERROR - fouram]: RPC error: [flush], <MilvusException: (code=65535, message=failed to call flush to data coordinator: channel not found[channel=by-dev-rootcoord-dml_1_455526670487326226v5])>, <Time:{'RPC start': '2025-01-24 14:29:34.649127', 'RPC error': '2025-01-24 14:31:14.127933'}> (decorators.py:140)
[2025-01-24 14:34:45,751 - ERROR - fouram]: RPC error: [flush], <MilvusException: (code=65535, message=failed to call flush to data coordinator: channel not found[channel=by-dev-rootcoord-dml_1_455526670488247265v5])>, <Time:{'RPC start': '2025-01-24 14:33:06.247088', 'RPC error': '2025-01-24 14:34:45.751032'}> (decorators.py:140)
[2025-01-24 14:34:57,370 - ERROR - fouram]: RPC error: [flush], <MilvusException: (code=65535, message=failed to call flush to data coordinator: channel not found[channel=by-dev-rootcoord-dml_1_455526670488257485v5])>, <Time:{'RPC start': '2025-01-24 14:33:17.258168', 'RPC error': '2025-01-24 14:34:57.370773'}> (decorators.py:140)
[2025-01-24 14:37:12,149 - ERROR - fouram]: RPC error: [flush], <MilvusException: (code=65535, message=failed to call flush to data coordinator: channel not found[channel=by-dev-rootcoord-dml_1_455526670489054735v3])>, <Time:{'RPC start': '2025-01-24 14:35:31.736496', 'RPC error': '2025-01-24 14:37:12.149857'}> (decorators.py:140)
[2025-01-24 14:38:30,240 - ERROR - fouram]: RPC error: [flush], <MilvusException: (code=65535, message=failed to call flush to data coordinator: channel not found[channel=by-dev-rootcoord-dml_1_455526670489798782v5])>, <Time:{'RPC start': '2025-01-24 14:36:50.763751', 'RPC error': '2025-01-24 14:38:30.240313'}> (decorators.py:140)
[2025-01-24 14:39:16,148 - ERROR - fouram]: RPC error: [flush], <MilvusException: (code=65535, message=failed to call flush to data coordinator: channel not found[channel=by-dev-rootcoord-dml_1_455526670489848640v1])>, <Time:{'RPC start': '2025-01-24 14:37:36.145595', 'RPC error': '2025-01-24 14:39:16.148673'}> (decorators.py:140)
[2025-01-24 14:54:05,211 - ERROR - fouram]: RPC error: [flush], <MilvusException: (code=65535, message=failed to call flush to data coordinator: channel not found[channel=by-dev-rootcoord-dml_1_455526670495227692v5])>, <Time:{'RPC start': '2025-01-24 14:52:25.037557', 'RPC error': '2025-01-24 14:54:05.211231'}> (decorators.py:140)
Expected Behavior
No response
Steps To Reproduce
1. create a collection with fields: 'id'(primary key), 'float_vector'(128dim), 'float_vector_1'(200dim), 'float16_vector'(768dim), 'sparse_float_vector', 'int64_1', 'varchar_1', 'array_varchar_1', 'bool_1', dynamic fields
2. build indexes
- IVF_SQ8: 'float_vector'
- DISKANN: 'float_vector_1'
- HNSW: 'float16_vector'
- SPARSE_INVERTED_INDEX: 'sparse_float_vector'
- STL_SORT: 'int64_1'
- Trie: 'varchar_1'
- INVERTED: 'array_varchar_1'
- BITMAP: 'bool_1'
3. insert 8m data
4. flush collection
5. rebuild indexes
6. load collection with replica=2
7. concurrent requests:
- hybrid_search
- search
- query
- scene_hybrid_search_test: 4 vector fields, 3 scalar fields, dynamic fields
(collection: create->insert->flush->index->load(replica=2)->hybrid_search->drop) <- flush failed
Milvus Log
No response
Anything else?
server config:
proxy:
replicas: 3
queryNode:
resources:
limits:
cpu: '8'
memory: 64Gi
requests:
cpu: '8'
memory: 64Gi
replicas: 3
nodeSelector:
node-role/nvme: 'true'
indexNode:
resources:
limits:
cpu: '8.0'
memory: 16Gi
requests:
cpu: '4.0'
memory: 4Gi
replicas: 4
dataNode:
replicas: 2
resources:
limits:
cpu: '2.0'
memory: 8Gi
requests:
cpu: '2.0'
memory: 5Gi
broker:
configData:
defaultRetentionTimeInMinutes: "60"
client config:
{
"dataset_params": {
"metric_type": "L2",
"dim": 128,
"dataset_name": "sift",
"dataset_size": "8m",
"ni_per": 2000,
"scalars_index": {
"int64_1": {
"index_type": "STL_SORT"
},
"varchar_1": {
"index_type": "Trie"
},
"array_varchar_1": {
"index_type": "INVERTED"
},
"bool_1": {
"index_type": "BITMAP"
}
},
"vectors_index": {
"float_vector_1": {
"index_type": "DISKANN",
"index_param": {},
"metric_type": "IP"
},
"float16_vector": {
"index_type": "HNSW",
"index_param": {
"M": 8,
"efConstruction": 300
},
"metric_type": "L2"
},
"sparse_float_vector": {
"index_type": "SPARSE_INVERTED_INDEX",
"index_param": {
"drop_ratio_build": 0.2
},
"metric_type": "IP"
}
},
"scalars_params": {
"float_vector_1": {
"params": {
"dim": 200
},
"other_params": {
"dataset": "text2img"
}
},
"float16_vector": {
"params": {
"dim": 768
},
"other_params": {
"dataset": "laion1b_float16"
}
},
"sparse_float_vector": {
"other_params": {
"dataset": "sparse_full",
"dim": 30000,
"sparse_range": [
100,
150
]
}
},
"array_varchar_1": {
"params": {
"max_length": 128,
"max_capacity": 10
},
"other_params": {
"dataset": "random_algorithm",
"algorithm_params": {
"algorithm_name": "specify_scope_array",
"specify_range": [
0,
10
],
"capacity_range": [
0,
10
]
}
}
}
}
},
"common_params": {
"data_organization": "row_insert"
},
"collection_params": {
"other_fields": [
"float_vector_1",
"float16_vector",
"sparse_float_vector",
"int64_1",
"varchar_1",
"array_varchar_1",
"bool_1"
],
"dynamic_fields": [
"json_dynamic",
"int8_dynamic",
"int32_dynamic",
"array_int64_dynamic"
],
"enable_dynamic_field": true,
"varchar_id": true,
"shards_num": 2,
"collection_name": "spring_festival_2"
},
"index_params": {
"index_type": "IVF_SQ8",
"index_param": {
"nlist": 1024
}
},
"load_params": {
"replica_number": 2
},
"concurrent_params": {
"concurrent_number": 20,
"during_time": "10d",
"interval": 20
},
"concurrent_tasks": [
{
"type": "hybrid_search",
"weight": 1,
"params": {
"nq": 2,
"top_k": 10,
"reqs": [
{
"search_param": {
"nprobe": 16
},
"anns_field": "float_vector",
"top_k": 100,
"expr": "int64_1 % 10 == 6"
},
{
"search_param": {
"search_list": 30
},
"anns_field": "float_vector_1",
"top_k": 9,
"expr": "bool_1 == false"
},
{
"search_param": {
"ef": 32
},
"anns_field": "float16_vector",
"top_k": 10,
"expr": "varchar_1 like '%9'"
},
{
"search_param": {
"drop_ratio_search": 0.2
},
"anns_field": "sparse_float_vector",
"top_k": 11,
"expr": "int8_dynamic >= 64"
}
],
"rerank": {
"WeightedRanker": [
0.85,
0.95,
0.5,
0.6
]
},
"output_fields": [
"*"
],
"timeout": 1200,
"random_data": true,
"check_task": "check_search_output"
}
},
{
"type": "search",
"weight": 1,
"params": {
"nq": 10,
"top_k": 10,
"search_param": {
"nprobe": 16
},
"expr": "bool_1 == true || array_contains_any(array_varchar_1, ['0', '5', '7'])",
"output_fields": [
"*"
],
"timeout": 1200,
"random_data": true,
"check_task": "check_search_output"
}
},
{
"type": "query",
"weight": 1,
"params": {
"expr": "",
"output_fields": [
"*"
],
"limit": 10,
"timeout": 1200,
"custom_expr": "id like '%{0}'",
"custom_range": [
0,
10
]
}
},
{
"type": "scene_hybrid_search_test",
"weight": 1,
"params": {
"nq": 1,
"top_k": 1,
"reqs": [
{
"search_param": {
"nprobe": 128
},
"anns_field": "float_vector",
"top_k": 100
},
{
"search_param": {
"nprobe": 32
},
"anns_field": "float_vector_1",
"top_k": 10
},
{
"search_param": {
"ef": 32
},
"anns_field": "float_vector_2",
"top_k": 5
},
{
"search_param": {
"search_list": 20
},
"anns_field": "float_vector_3",
"top_k": 10
}
],
"rerank": {
"RRFRanker": []
},
"timeout": 600,
"random_data": true,
"dataset": "local",
"dim": 128,
"shards_num": 6,
"data_size": 30000,
"nb": 3000,
"index_type": "IVF_SQ8",
"index_param": {
"nlist": 2048
},
"metric_type": "L2",
"output_fields": [
"*"
],
"other_fields": [
"float_vector_1",
"float_vector_2",
"float_vector_3",
"int64_1",
"bool_1",
"varchar_1"
],
"dynamic_fields": [
"json_dynamic",
"float_dynamic",
"array_bool_dynamic"
],
"enable_dynamic_field": true,
"data_organization": "row_insert",
"replica_number": 2,
"scalars_params": {
"float_vector_1": {
"params": {
"dim": 128
},
"other_params": {
"dataset": "sift"
}
},
"float_vector_2": {
"params": {
"dim": 128
},
"other_params": {
"dataset": "sift"
}
},
"float_vector_3": {
"params": {
"dim": 128
},
"other_params": {
"dataset": "sift"
}
}
},
"scalars_index": {
"int64_1": {},
"bool_1": {
"index_type": "INVERTED"
},
"varchar_1": {
"index_type": "BITMAP"
}
},
"vectors_index": {
"float_vector_1": {
"index_type": "IVF_FLAT",
"index_param": {
"nlist": 1024
},
"metric_type": "L2"
},
"float_vector_2": {
"index_type": "HNSW",
"index_param": {
"M": 8,
"efConstruction": 200
},
"metric_type": "L2"
},
"float_vector_3": {
"index_type": "DISKANN",
"index_param": {},
"metric_type": "IP"
}
},
"hybrid_search_counts": 10
}
}
]
}