Skip to content

Create index failed caused by server crashed. #2642



Describe the bug

21:58:32  2020-06-20:13:58:32,104 INFO     [] Building index start, collection_name: sift_1m_128_128_l2, index_type: IVFLAT
21:58:32  2020-06-20:13:58:32,105 INFO     [] {'nlist': 4096}
22:01:09  create_index
22:01:09  <_MultiThreadedRendezvous of RPC that terminated with:
22:01:09  	status = StatusCode.UNAVAILABLE
22:01:09  	details = "Socket closed"
22:01:09  	debug_error_string = "{"created":"@1592661660.495892340","description":"Error received from peer ipv4:","file":"src/core/lib/surface/","file_line":1056,"grpc_message":"Socket closed","grpc_status":14}"
22:01:09  >
22:01:09  2020-06-20:14:01:00,497 ERROR    [] create_index
22:01:09  <_MultiThreadedRendezvous of RPC that terminated with:
22:01:09  	status = StatusCode.UNAVAILABLE
22:01:09  	details = "Socket closed"
22:01:09  	debug_error_string = "{"created":"@1592661660.495892340","description":"Error received from peer ipv4:","file":"src/core/lib/surface/","file_line":1056,"grpc_message":"Socket closed","grpc_status":14}"
22:01:09  >
22:01:09  2020-06-20:14:01:00,497 ERROR    [] 'tuple' object has no attribute 'OK'
22:01:09  2020-06-20:14:01:00,499 ERROR    [] Traceback (most recent call last):
22:01:09    File "", line 67, in queue_worker
22:01:09, collection)
22:01:09    File "/home/jenkins/agent/workspace/milvus-benchmark-0.8.1/milvus_benchmark/", line 155, in run
22:01:09      milvus_instance.create_index(index_type, index_param)
22:01:09    File "/home/jenkins/agent/workspace/milvus-benchmark-0.8.1/milvus_benchmark/", line 32, in wrapper
22:01:09      result = func(*args, **kwargs)
22:01:09    File "/home/jenkins/agent/workspace/milvus-benchmark-0.8.1/milvus_benchmark/", line 127, in create_index
22:01:09      self.check_status(status)
22:01:09    File "/home/jenkins/agent/workspace/milvus-benchmark-0.8.1/milvus_benchmark/", line 71, in check_status
22:01:09      if not status.OK():
22:01:09  AttributeError: 'tuple' object has no attribute 'OK'
22:01:09  2020-06-20:14:01:00,500 DEBUG    [] benchmark-test-fxghtjsw
22:01:09  Error: uninstall: Release not loaded: benchmark-test-gzelwvgk: release: not found
22:01:09  2020-06-20:14:01:00,575 DEBUG    [] helm uninstall -n milvus benchmark-test-fxghtjsw
22:01:09  release "benchmark-test-fxghtjsw" uninstalled
22:01:09  2020-06-20:14:01:00,797 DEBUG    [] All task finished in queue: poseidon
2020-06-20 22:01:01,256 | INFO | default | [SERVER] Milvus Release version: v0.8.1, built at 2020-06-20 12:45.41
2020-06-20 22:01:01,256 | INFO | default | [SERVER] CPU edition
2020-06-20 22:01:01,266 | INFO | default | [ENGINE] Using SQLite
2020-06-20 22:01:01,305 | INFO | default | [WAL] record type 5 record lsn 140734830687464 error code  0
2020-06-20 22:01:01,305 | INFO | default | [WAL] record type 5 collection  lsn 0
2020-06-20 22:01:02,306 | INFO | default | [WAL] record type 5 collection  lsn 0
2020-06-20 22:01:02,311 | INFO | default | [SERVER] Server received critical signal: 11
2020-06-20 22:01:02,311 | INFO | default | [SERVER] Call stack:
2020-06-20 22:01:02,312 | INFO | default | [SERVER] ../bin/milvus_server() [0x5e4584]
2020-06-20 22:01:02,312 | INFO | default | [SERVER] ../bin/milvus_server() [0x5e4ca8]
2020-06-20 22:01:02,312 | INFO | default | [SERVER] /lib64/ [0x7fd70ece2400]
2020-06-20 22:01:02,312 | INFO | default | [SERVER] ../bin/milvus_server() [0x7c8274]
2020-06-20 22:01:02,312 | INFO | default | [SERVER] ../bin/milvus_server() [0x63bfce]
2020-06-20 22:01:02,312 | INFO | default | [SERVER] ../bin/milvus_server() [0x6411ca]
2020-06-20 22:01:02,312 | INFO | default | [SERVER] ../bin/milvus_server() [0x4acb8d]
2020-06-20 22:01:02,312 | INFO | default | [SERVER] ../bin/milvus_server() [0x4a79ca]
2020-06-20 22:01:02,312 | INFO | default | [SERVER] ../bin/milvus_server() [0xd12bff]
2020-06-20 22:01:02,312 | INFO | default | [SERVER] /lib64/ [0x7fd70fab6ea5]
2020-06-20 22:01:02,312 | INFO | default | [SERVER] /lib64/ [0x7fd70edaa8dd]
2020-06-20 22:01:02,364 | INFO | default | [WAL] record type 5 collection  lsn 0

error logs:

2020-06-20 22:01:01,304 | ERROR | default | [WAL] bad wal file 2
2020-06-20 22:01:02,307 | ERROR | default | [ENGINE] Collection file doesn't exist: /test/milvus/db_data_080/sift_1m_128_128_l2/db/tables/sift_1m_128_128_l2/1592661205816766000/1592661205816766000 in path: /test/milvus/db_data_080/sift_1m_128_128_l2/db for collection: sift_1m_128_128_l2
2020-06-20 22:01:02,310 | ERROR | default | [ENGINE] Failed to open file: /test/milvus/db_data_080/sift_1m_128_128_l2/db/tables/sift_1m_128_128_l2/1592661204384909000/deleted_docs, error: No such file or directory
2020-06-20 22:01:02,310 | ERROR | default | [ENGINE] Failed to load segment from /test/milvus/db_data_080/sift_1m_128_128_l2/db/tables/sift_1m_128_128_l2/1592661204384909000/1592661204384909000
2020-06-20 22:01:02,311 | ERROR | default | [ENGINE] Failed to load segment from
2020-06-20 22:01:02,366 | ERROR | default | [ENGINE] Collection file doesn't exist: /test/milvus/db_data_080/sift_1m_128_128_l2/db/tables/sift_1m_128_128_l2/1592661205816766000/1592661205816766000 in path: /test/milvus/db_data_080/sift_1m_128_128_l2/db for collection: sift_1m_128_128_l2
2020-06-20 22:01:02,367 | ERROR | default | [ENGINE] Failed to open file: /test/milvus/db_data_080/sift_1m_128_128_l2/db/tables/sift_1m_128_128_l2/1592661204384909000/deleted_docs, error: No such file or directory
2020-06-20 22:01:02,367 | ERROR | default | [ENGINE] Failed to load segment from /test/milvus/db_data_080/sift_1m_128_128_l2/db/tables/sift_1m_128_128_l2/1592661204384909000/1592661204384909000
2020-06-20 22:01:02,367 | ERROR | default | [ENGINE] Failed to load segment from
2020-06-20 22:01:02,419 | ERROR | default | [ENGINE] Failed to build index 1592661662311016000, reason: Resource deadlock avoided

debug logs:

2020-06-20 22:01:02,367 | DEBUG | default | [ENGINE] Index params: {"dim":128,"gpu_id":0,"metric_type":"L2"}
2020-06-20 22:01:02,367 | DEBUG | default | [SERVER] BuildIndexJob 2 finish index file: 36
2020-06-20 22:01:02,367 | DEBUG | default | [SERVER] cpu load BuildIndexTask
2020-06-20 22:01:02,367 | DEBUG | default | [SERVER] BuildIndexJob 2 all done
2020-06-20 22:01:02,367 | DEBUG | default | [ENGINE] Building index job 2 succeed.
2020-06-20 22:01:02,367 | DEBUG | default | [ENGINE] Unmark ongoing file:1592661204384909000 refcount:0
2020-06-20 22:01:02,367 | DEBUG | default | [ENGINE] Finish build index file 1592661204384909000
2020-06-20 22:01:02,367 | DEBUG | default | [ENGINE] Index params: {"dim":128,"gpu_id":0,"metric_type":"L2"}
2020-06-20 22:01:02,368 | DEBUG | default | [SERVER] BuildIndexJob 3 finish index file: 37
2020-06-20 22:01:02,368 | DEBUG | default | [SERVER] BuildIndexJob 3 all done
2020-06-20 22:01:02,368 | DEBUG | default | [ENGINE] Building index job 3 succeed.
2020-06-20 22:01:02,368 | DEBUG | default | [ENGINE] Unmark ongoing file:1592661205816766000 refcount:0
2020-06-20 22:01:02,368 | DEBUG | default | [ENGINE] Finish build index file 1592661205816766000
2020-06-20 22:01:02,368 | DEBUG | default | [ENGINE] Background build index thread finished
2020-06-20 22:01:02,368 | DEBUG | default | [ENGINE] DB background thread exit
2020-06-20 22:01:02,369 | DEBUG | default | [ENGINE] Remove collection file type as NEW
2020-06-20 22:01:02,370 | DEBUG | default | [ENGINE] Clean 1 files
2020-06-20 22:01:02,418 | DEBUG | default | [ENGINE] DB background metric thread exit
2020-06-20 22:01:02,419 | DEBUG | default | [ENGINE] Update single collection file, file id = 1592661662311016000
2020-06-20 22:01:02,419 | DEBUG | default | [SERVER] BuildIndexJob 0 finish index file: 0
2020-06-20 22:01:02,419 | DEBUG | default | [SERVER] XBuildIndexTask::Execute 0: totally cost (0.108663 second [108.663442 ms])
2020-06-20 22:01:02,419 | DEBUG | default | [SERVER] XBuildIndexTask::Execute 0: totally cost (0.000003 second [0.003230 ms])
2020-06-20 22:01:02,419 | DEBUG | default | [SERVER] XBuildIndexTask::Execute 0: totally cost (0.000003 second [0.002880 ms])
2020-06-20 22:01:02,419 | DEBUG | default | [SERVER] XBuildIndexTask::Execute 0: totally cost (0.000003 second [0.002614 ms])

Steps/Code to reproduce behavior

  1. create collection and insert sift-1m into it, flush
  2. clean up the container.
  3. start test with new container, and build index with ivf_flat
    Create index failed caused by server crashed
    data/log path on NAS: /test/milvus/db_data_080/sift_1m_128_128_l2
    Expected behavior

Environment details
commit id:                    0.8.1-cpu-centos7-release         51e4c23dbd63

If applicable, add screenshots to help explain your problem.

Additional context
Add any other context about the problem here.




kind/bugIssues or changes related a bugseverity/criticalCritical, lead to crash, data missing, wrong result, function totally doesn't work.


No type


No projects


No milestone


None yet


No branches or pull requests

Issue actions