Skip to content

[Bug]: [benchmark][cluster] flush raises error failed to call flush to data coordinator: channel not found in concurrent ddl & dql scene #39588

Open
@wangting0128

Description

Is there an existing issue for this?

  • I have searched the existing issues

Environment

- Milvus version:2.5-20250124-758ac5a4-amd64
- Deployment mode(standalone or cluster):cluster
- MQ type(rocksmq, pulsar or kafka):pulsar    
- SDK version(e.g. pymilvus v2.0.0rc2):2.5.0rc124
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

argo task: fouramf-67q8g

server:

NAME                                                  READY   STATUS             RESTARTS           AGE     IP              NODE         NOMINATED NODE   READINESS GATES
envoy-spring-festival-2-65b8bb48f6-9prjg              1/1     Running            0                  2d12h   10.104.13.194   4am-node16   <none>           <none>
spring-festival-2-etcd-0                              1/1     Running            0                  11h     10.104.19.40    4am-node28   <none>           <none>
spring-festival-2-etcd-1                              1/1     Running            0                  11h     10.104.15.162   4am-node20   <none>           <none>
spring-festival-2-etcd-2                              1/1     Running            0                  11h     10.104.27.63    4am-node31   <none>           <none>
spring-festival-2-milvus-datanode-55cb855995-5b9pj    1/1     Running            2 (11h ago)        11h     10.104.13.143   4am-node16   <none>           <none>
spring-festival-2-milvus-datanode-55cb855995-d25q4    1/1     Running            2 (11h ago)        11h     10.104.23.78    4am-node27   <none>           <none>
spring-festival-2-milvus-indexnode-7c94447dcf-5qv74   1/1     Running            2 (11h ago)        11h     10.104.14.151   4am-node18   <none>           <none>
spring-festival-2-milvus-indexnode-7c94447dcf-68mqm   1/1     Running            2 (11h ago)        11h     10.104.9.156    4am-node14   <none>           <none>
spring-festival-2-milvus-indexnode-7c94447dcf-7d5cz   1/1     Running            2 (11h ago)        11h     10.104.34.131   4am-node37   <none>           <none>
spring-festival-2-milvus-indexnode-7c94447dcf-lwtxw   1/1     Running            2 (11h ago)        11h     10.104.21.165   4am-node24   <none>           <none>
spring-festival-2-milvus-mixcoord-8486958b8f-967zd    1/1     Running            2 (11h ago)        11h     10.104.34.130   4am-node37   <none>           <none>
spring-festival-2-milvus-proxy-996ddb597-dvxgh        1/1     Running            2 (11h ago)        11h     10.104.34.132   4am-node37   <none>           <none>
spring-festival-2-milvus-proxy-996ddb597-lh59d        1/1     Running            2 (11h ago)        11h     10.104.23.79    4am-node27   <none>           <none>
spring-festival-2-milvus-proxy-996ddb597-n8scb        1/1     Running            2 (11h ago)        11h     10.104.21.167   4am-node24   <none>           <none>
spring-festival-2-milvus-querynode-7d4c895bf-gztl8    1/1     Running            2 (11h ago)        11h     10.104.23.80    4am-node27   <none>           <none>
spring-festival-2-milvus-querynode-7d4c895bf-s7cst    1/1     Running            2 (11h ago)        11h     10.104.30.227   4am-node38   <none>           <none>
spring-festival-2-milvus-querynode-7d4c895bf-tkh5f    1/1     Running            4 (11h ago)        11h     10.104.32.25    4am-node39   <none>           <none>
spring-festival-2-minio-0                             1/1     Running            0                  11h     10.104.15.160   4am-node20   <none>           <none>
spring-festival-2-minio-1                             1/1     Running            0                  11h     10.104.27.56    4am-node31   <none>           <none>
spring-festival-2-minio-2                             1/1     Running            0                  11h     10.104.19.41    4am-node28   <none>           <none>
spring-festival-2-minio-3                             1/1     Running            0                  11h     10.104.26.127   4am-node32   <none>           <none>
spring-festival-2-pulsarv3-bookie-0                   1/1     Running            0                  11h     10.104.15.161   4am-node20   <none>           <none>
spring-festival-2-pulsarv3-bookie-1                   1/1     Running            0                  11h     10.104.19.42    4am-node28   <none>           <none>
spring-festival-2-pulsarv3-bookie-2                   1/1     Running            0                  11h     10.104.27.62    4am-node31   <none>           <none>
spring-festival-2-pulsarv3-bookie-init-dbsdw          0/1     Completed          0                  11h     10.104.21.164   4am-node24   <none>           <none>
spring-festival-2-pulsarv3-broker-0                   1/1     Running            0                  11h     10.104.13.147   4am-node16   <none>           <none>
spring-festival-2-pulsarv3-broker-1                   1/1     Running            0                  11h     10.104.15.153   4am-node20   <none>           <none>
spring-festival-2-pulsarv3-proxy-0                    1/1     Running            0                  11h     10.104.15.152   4am-node20   <none>           <none>
spring-festival-2-pulsarv3-proxy-1                    1/1     Running            0                  11h     10.104.19.35    4am-node28   <none>           <none>
spring-festival-2-pulsarv3-pulsar-init-2lf8j          0/1     Completed          0                  11h     10.104.13.144   4am-node16   <none>           <none>
spring-festival-2-pulsarv3-recovery-0                 1/1     Running            0                  11h     10.104.21.166   4am-node24   <none>           <none>
spring-festival-2-pulsarv3-zookeeper-0                1/1     Running            0                  11h     10.104.15.159   4am-node20   <none>           <none>
spring-festival-2-pulsarv3-zookeeper-1                1/1     Running            0                  11h     10.104.27.61    4am-node31   <none>           <none>
spring-festival-2-pulsarv3-zookeeper-2                1/1     Running            0                  11h     10.104.19.44    4am-node28   <none>           <none>

trace_id_642b50fecf9d4fa46bb165697c88de5c.log

Image

client logs:

[2025-01-24 14:18:47,743 - ERROR - fouram]: RPC error: [flush], <MilvusException: (code=65535, message=failed to call flush to data coordinator: channel not found[channel=by-dev-rootcoord-dml_1_455526670483048983v3])>, <Time:{'RPC start': '2025-01-24 14:17:08.499424', 'RPC error': '2025-01-24 14:18:47.743672'}> (decorators.py:140)
[2025-01-24 14:24:05,773 - ERROR - fouram]: RPC error: [flush], <MilvusException: (code=65535, message=failed to call flush to data coordinator: channel not found[channel=by-dev-rootcoord-dml_1_455526670484931669v1])>, <Time:{'RPC start': '2025-01-24 14:22:26.383758', 'RPC error': '2025-01-24 14:24:05.773857'}> (decorators.py:140)
[2025-01-24 14:29:22,958 - ERROR - fouram]: RPC error: [flush], <MilvusException: (code=65535, message=failed to call flush to data coordinator: channel not found[channel=by-dev-rootcoord-dml_1_455526670486572603v1])>, <Time:{'RPC start': '2025-01-24 14:27:43.438321', 'RPC error': '2025-01-24 14:29:22.958888'}> (decorators.py:140)
[2025-01-24 14:31:14,127 - ERROR - fouram]: RPC error: [flush], <MilvusException: (code=65535, message=failed to call flush to data coordinator: channel not found[channel=by-dev-rootcoord-dml_1_455526670487326226v5])>, <Time:{'RPC start': '2025-01-24 14:29:34.649127', 'RPC error': '2025-01-24 14:31:14.127933'}> (decorators.py:140)
[2025-01-24 14:34:45,751 - ERROR - fouram]: RPC error: [flush], <MilvusException: (code=65535, message=failed to call flush to data coordinator: channel not found[channel=by-dev-rootcoord-dml_1_455526670488247265v5])>, <Time:{'RPC start': '2025-01-24 14:33:06.247088', 'RPC error': '2025-01-24 14:34:45.751032'}> (decorators.py:140)
[2025-01-24 14:34:57,370 - ERROR - fouram]: RPC error: [flush], <MilvusException: (code=65535, message=failed to call flush to data coordinator: channel not found[channel=by-dev-rootcoord-dml_1_455526670488257485v5])>, <Time:{'RPC start': '2025-01-24 14:33:17.258168', 'RPC error': '2025-01-24 14:34:57.370773'}> (decorators.py:140)
[2025-01-24 14:37:12,149 - ERROR - fouram]: RPC error: [flush], <MilvusException: (code=65535, message=failed to call flush to data coordinator: channel not found[channel=by-dev-rootcoord-dml_1_455526670489054735v3])>, <Time:{'RPC start': '2025-01-24 14:35:31.736496', 'RPC error': '2025-01-24 14:37:12.149857'}> (decorators.py:140)
[2025-01-24 14:38:30,240 - ERROR - fouram]: RPC error: [flush], <MilvusException: (code=65535, message=failed to call flush to data coordinator: channel not found[channel=by-dev-rootcoord-dml_1_455526670489798782v5])>, <Time:{'RPC start': '2025-01-24 14:36:50.763751', 'RPC error': '2025-01-24 14:38:30.240313'}> (decorators.py:140)
[2025-01-24 14:39:16,148 - ERROR - fouram]: RPC error: [flush], <MilvusException: (code=65535, message=failed to call flush to data coordinator: channel not found[channel=by-dev-rootcoord-dml_1_455526670489848640v1])>, <Time:{'RPC start': '2025-01-24 14:37:36.145595', 'RPC error': '2025-01-24 14:39:16.148673'}> (decorators.py:140)
[2025-01-24 14:54:05,211 - ERROR - fouram]: RPC error: [flush], <MilvusException: (code=65535, message=failed to call flush to data coordinator: channel not found[channel=by-dev-rootcoord-dml_1_455526670495227692v5])>, <Time:{'RPC start': '2025-01-24 14:52:25.037557', 'RPC error': '2025-01-24 14:54:05.211231'}> (decorators.py:140)

Expected Behavior

No response

Steps To Reproduce

1. create a collection with fields: 'id'(primary key), 'float_vector'(128dim), 'float_vector_1'(200dim), 'float16_vector'(768dim), 'sparse_float_vector', 'int64_1', 'varchar_1', 'array_varchar_1', 'bool_1', dynamic fields
2. build indexes
   - IVF_SQ8: 'float_vector'
   - DISKANN: 'float_vector_1'
   - HNSW: 'float16_vector'
   - SPARSE_INVERTED_INDEX: 'sparse_float_vector'
   - STL_SORT: 'int64_1'
   - Trie: 'varchar_1'
   - INVERTED: 'array_varchar_1'
   - BITMAP: 'bool_1'
3. insert 8m data
4. flush collection
5. rebuild indexes
6. load collection with replica=2
7. concurrent requests:
   - hybrid_search
   - search
   - query
   - scene_hybrid_search_test: 4 vector fields, 3 scalar fields, dynamic fields
     (collection: create->insert->flush->index->load(replica=2)->hybrid_search->drop)  <- flush failed

Milvus Log

No response

Anything else?

server config:

proxy:
  replicas: 3
queryNode:
  resources:
    limits:
      cpu: '8'
      memory: 64Gi
    requests:
      cpu: '8'
      memory: 64Gi
  replicas: 3
  nodeSelector:
    node-role/nvme: 'true'
indexNode:
  resources:
    limits:
      cpu: '8.0'
      memory: 16Gi
    requests:
      cpu: '4.0'
      memory: 4Gi
  replicas: 4
dataNode:
  replicas: 2
  resources:
    limits:
      cpu: '2.0'
      memory: 8Gi
    requests:
      cpu: '2.0'
      memory: 5Gi
broker:
  configData:
    defaultRetentionTimeInMinutes: "60"

client config:

{
     "dataset_params": {
          "metric_type": "L2",
          "dim": 128,
          "dataset_name": "sift",
          "dataset_size": "8m",
          "ni_per": 2000,
          "scalars_index": {
               "int64_1": {
                    "index_type": "STL_SORT"
               },
               "varchar_1": {
                    "index_type": "Trie"
               },
               "array_varchar_1": {
                    "index_type": "INVERTED"
               },
               "bool_1": {
                    "index_type": "BITMAP"
               }
          },
          "vectors_index": {
               "float_vector_1": {
                    "index_type": "DISKANN",
                    "index_param": {},
                    "metric_type": "IP"
               },
               "float16_vector": {
                    "index_type": "HNSW",
                    "index_param": {
                         "M": 8,
                         "efConstruction": 300
                    },
                    "metric_type": "L2"
               },
               "sparse_float_vector": {
                    "index_type": "SPARSE_INVERTED_INDEX",
                    "index_param": {
                         "drop_ratio_build": 0.2
                    },
                    "metric_type": "IP"
               }
          },
          "scalars_params": {
               "float_vector_1": {
                    "params": {
                         "dim": 200
                    },
                    "other_params": {
                         "dataset": "text2img"
                    }
               },
               "float16_vector": {
                    "params": {
                         "dim": 768
                    },
                    "other_params": {
                         "dataset": "laion1b_float16"
                    }
               },
               "sparse_float_vector": {
                    "other_params": {
                         "dataset": "sparse_full",
                         "dim": 30000,
                         "sparse_range": [
                              100,
                              150
                         ]
                    }
               },
               "array_varchar_1": {
                    "params": {
                         "max_length": 128,
                         "max_capacity": 10
                    },
                    "other_params": {
                         "dataset": "random_algorithm",
                         "algorithm_params": {
                              "algorithm_name": "specify_scope_array",
                              "specify_range": [
                                   0,
                                   10
                              ],
                              "capacity_range": [
                                   0,
                                   10
                              ]
                         }
                    }
               }
          }
     },
     "common_params": {
          "data_organization": "row_insert"
     },
     "collection_params": {
          "other_fields": [
               "float_vector_1",
               "float16_vector",
               "sparse_float_vector",
               "int64_1",
               "varchar_1",
               "array_varchar_1",
               "bool_1"
          ],
          "dynamic_fields": [
               "json_dynamic",
               "int8_dynamic",
               "int32_dynamic",
               "array_int64_dynamic"
          ],
          "enable_dynamic_field": true,
          "varchar_id": true,
          "shards_num": 2,
          "collection_name": "spring_festival_2"
     },
     "index_params": {
          "index_type": "IVF_SQ8",
          "index_param": {
               "nlist": 1024
          }
     },
     "load_params": {
          "replica_number": 2
     },
     "concurrent_params": {
          "concurrent_number": 20,
          "during_time": "10d",
          "interval": 20
     },
     "concurrent_tasks": [
          {
               "type": "hybrid_search",
               "weight": 1,
               "params": {
                    "nq": 2,
                    "top_k": 10,
                    "reqs": [
                         {
                              "search_param": {
                                   "nprobe": 16
                              },
                              "anns_field": "float_vector",
                              "top_k": 100,
                              "expr": "int64_1 % 10 == 6"
                         },
                         {
                              "search_param": {
                                   "search_list": 30
                              },
                              "anns_field": "float_vector_1",
                              "top_k": 9,
                              "expr": "bool_1 == false"
                         },
                         {
                              "search_param": {
                                   "ef": 32
                              },
                              "anns_field": "float16_vector",
                              "top_k": 10,
                              "expr": "varchar_1 like '%9'"
                         },
                         {
                              "search_param": {
                                   "drop_ratio_search": 0.2
                              },
                              "anns_field": "sparse_float_vector",
                              "top_k": 11,
                              "expr": "int8_dynamic >= 64"
                         }
                    ],
                    "rerank": {
                         "WeightedRanker": [
                              0.85,
                              0.95,
                              0.5,
                              0.6
                         ]
                    },
                    "output_fields": [
                         "*"
                    ],
                    "timeout": 1200,
                    "random_data": true,
                    "check_task": "check_search_output"
               }
          },
          {
               "type": "search",
               "weight": 1,
               "params": {
                    "nq": 10,
                    "top_k": 10,
                    "search_param": {
                         "nprobe": 16
                    },
                    "expr": "bool_1 == true || array_contains_any(array_varchar_1, ['0', '5', '7'])",
                    "output_fields": [
                         "*"
                    ],
                    "timeout": 1200,
                    "random_data": true,
                    "check_task": "check_search_output"
               }
          },
          {
               "type": "query",
               "weight": 1,
               "params": {
                    "expr": "",
                    "output_fields": [
                         "*"
                    ],
                    "limit": 10,
                    "timeout": 1200,
                    "custom_expr": "id like '%{0}'",
                    "custom_range": [
                         0,
                         10
                    ]
               }
          },
          {
               "type": "scene_hybrid_search_test",
               "weight": 1,
               "params": {
                    "nq": 1,
                    "top_k": 1,
                    "reqs": [
                         {
                              "search_param": {
                                   "nprobe": 128
                              },
                              "anns_field": "float_vector",
                              "top_k": 100
                         },
                         {
                              "search_param": {
                                   "nprobe": 32
                              },
                              "anns_field": "float_vector_1",
                              "top_k": 10
                         },
                         {
                              "search_param": {
                                   "ef": 32
                              },
                              "anns_field": "float_vector_2",
                              "top_k": 5
                         },
                         {
                              "search_param": {
                                   "search_list": 20
                              },
                              "anns_field": "float_vector_3",
                              "top_k": 10
                         }
                    ],
                    "rerank": {
                         "RRFRanker": []
                    },
                    "timeout": 600,
                    "random_data": true,
                    "dataset": "local",
                    "dim": 128,
                    "shards_num": 6,
                    "data_size": 30000,
                    "nb": 3000,
                    "index_type": "IVF_SQ8",
                    "index_param": {
                         "nlist": 2048
                    },
                    "metric_type": "L2",
                    "output_fields": [
                         "*"
                    ],
                    "other_fields": [
                         "float_vector_1",
                         "float_vector_2",
                         "float_vector_3",
                         "int64_1",
                         "bool_1",
                         "varchar_1"
                    ],
                    "dynamic_fields": [
                         "json_dynamic",
                         "float_dynamic",
                         "array_bool_dynamic"
                    ],
                    "enable_dynamic_field": true,
                    "data_organization": "row_insert",
                    "replica_number": 2,
                    "scalars_params": {
                         "float_vector_1": {
                              "params": {
                                   "dim": 128
                              },
                              "other_params": {
                                   "dataset": "sift"
                              }
                         },
                         "float_vector_2": {
                              "params": {
                                   "dim": 128
                              },
                              "other_params": {
                                   "dataset": "sift"
                              }
                         },
                         "float_vector_3": {
                              "params": {
                                   "dim": 128
                              },
                              "other_params": {
                                   "dataset": "sift"
                              }
                         }
                    },
                    "scalars_index": {
                         "int64_1": {},
                         "bool_1": {
                              "index_type": "INVERTED"
                         },
                         "varchar_1": {
                              "index_type": "BITMAP"
                         }
                    },
                    "vectors_index": {
                         "float_vector_1": {
                              "index_type": "IVF_FLAT",
                              "index_param": {
                                   "nlist": 1024
                              },
                              "metric_type": "L2"
                         },
                         "float_vector_2": {
                              "index_type": "HNSW",
                              "index_param": {
                                   "M": 8,
                                   "efConstruction": 200
                              },
                              "metric_type": "L2"
                         },
                         "float_vector_3": {
                              "index_type": "DISKANN",
                              "index_param": {},
                              "metric_type": "IP"
                         }
                    },
                    "hybrid_search_counts": 10
               }
          }
     ]
}

Metadata

Assignees

Labels

kind/bugIssues or changes related a bugtest/benchmarkbenchmark testtriage/acceptedIndicates an issue or PR is ready to be actively worked on.

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions