Skip to content

Create index failed caused by server crashed. #2642

Closed
@del-zhenwu

Description

Describe the bug

21:58:32  2020-06-20:13:58:32,104 INFO     [client.py:123] Building index start, collection_name: sift_1m_128_128_l2, index_type: IVFLAT
21:58:32  2020-06-20:13:58:32,105 INFO     [client.py:125] {'nlist': 4096}
22:01:09  create_index
22:01:09  <_MultiThreadedRendezvous of RPC that terminated with:
22:01:09  	status = StatusCode.UNAVAILABLE
22:01:09  	details = "Socket closed"
22:01:09  	debug_error_string = "{"created":"@1592661660.495892340","description":"Error received from peer ipv4:10.44.0.1:19530","file":"src/core/lib/surface/call.cc","file_line":1056,"grpc_message":"Socket closed","grpc_status":14}"
22:01:09  >
22:01:09  2020-06-20:14:01:00,497 ERROR    [grpc_handler.py:41] create_index
22:01:09  <_MultiThreadedRendezvous of RPC that terminated with:
22:01:09  	status = StatusCode.UNAVAILABLE
22:01:09  	details = "Socket closed"
22:01:09  	debug_error_string = "{"created":"@1592661660.495892340","description":"Error received from peer ipv4:10.44.0.1:19530","file":"src/core/lib/surface/call.cc","file_line":1056,"grpc_message":"Socket closed","grpc_status":14}"
22:01:09  >
22:01:09  2020-06-20:14:01:00,497 ERROR    [main.py:69] 'tuple' object has no attribute 'OK'
22:01:09  2020-06-20:14:01:00,499 ERROR    [main.py:70] Traceback (most recent call last):
22:01:09    File "main.py", line 67, in queue_worker
22:01:09      runner.run(run_type, collection)
22:01:09    File "/home/jenkins/agent/workspace/milvus-benchmark-0.8.1/milvus_benchmark/k8s_runner.py", line 155, in run
22:01:09      milvus_instance.create_index(index_type, index_param)
22:01:09    File "/home/jenkins/agent/workspace/milvus-benchmark-0.8.1/milvus_benchmark/client.py", line 32, in wrapper
22:01:09      result = func(*args, **kwargs)
22:01:09    File "/home/jenkins/agent/workspace/milvus-benchmark-0.8.1/milvus_benchmark/client.py", line 127, in create_index
22:01:09      self.check_status(status)
22:01:09    File "/home/jenkins/agent/workspace/milvus-benchmark-0.8.1/milvus_benchmark/client.py", line 71, in check_status
22:01:09      if not status.OK():
22:01:09  AttributeError: 'tuple' object has no attribute 'OK'
22:01:09  
22:01:09  2020-06-20:14:01:00,500 DEBUG    [k8s_runner.py:66] benchmark-test-fxghtjsw
22:01:09  Error: uninstall: Release not loaded: benchmark-test-gzelwvgk: release: not found
22:01:09  2020-06-20:14:01:00,575 DEBUG    [utils.py:259] helm uninstall -n milvus benchmark-test-fxghtjsw
22:01:09  release "benchmark-test-fxghtjsw" uninstalled
22:01:09  2020-06-20:14:01:00,797 DEBUG    [main.py:75] All task finished in queue: poseidon
2020-06-20 22:01:01,256 | INFO | default | [SERVER] Milvus Release version: v0.8.1, built at 2020-06-20 12:45.41
2020-06-20 22:01:01,256 | INFO | default | [SERVER] CPU edition
2020-06-20 22:01:01,266 | INFO | default | [ENGINE] Using SQLite
2020-06-20 22:01:01,305 | INFO | default | [WAL] record type 5 record lsn 140734830687464 error code  0
2020-06-20 22:01:01,305 | INFO | default | [WAL] record type 5 collection  lsn 0
2020-06-20 22:01:02,306 | INFO | default | [WAL] record type 5 collection  lsn 0
2020-06-20 22:01:02,311 | INFO | default | [SERVER] Server received critical signal: 11
2020-06-20 22:01:02,311 | INFO | default | [SERVER] Call stack:
2020-06-20 22:01:02,312 | INFO | default | [SERVER] ../bin/milvus_server() [0x5e4584]
2020-06-20 22:01:02,312 | INFO | default | [SERVER] ../bin/milvus_server() [0x5e4ca8]
2020-06-20 22:01:02,312 | INFO | default | [SERVER] /lib64/libc.so.6(+0x36400) [0x7fd70ece2400]
2020-06-20 22:01:02,312 | INFO | default | [SERVER] ../bin/milvus_server() [0x7c8274]
2020-06-20 22:01:02,312 | INFO | default | [SERVER] ../bin/milvus_server() [0x63bfce]
2020-06-20 22:01:02,312 | INFO | default | [SERVER] ../bin/milvus_server() [0x6411ca]
2020-06-20 22:01:02,312 | INFO | default | [SERVER] ../bin/milvus_server() [0x4acb8d]
2020-06-20 22:01:02,312 | INFO | default | [SERVER] ../bin/milvus_server() [0x4a79ca]
2020-06-20 22:01:02,312 | INFO | default | [SERVER] ../bin/milvus_server() [0xd12bff]
2020-06-20 22:01:02,312 | INFO | default | [SERVER] /lib64/libpthread.so.0(+0x7ea5) [0x7fd70fab6ea5]
2020-06-20 22:01:02,312 | INFO | default | [SERVER] /lib64/libc.so.6(clone+0x6d) [0x7fd70edaa8dd]
2020-06-20 22:01:02,364 | INFO | default | [WAL] record type 5 collection  lsn 0

error logs:

2020-06-20 22:01:01,304 | ERROR | default | [WAL] bad wal file 2
2020-06-20 22:01:02,307 | ERROR | default | [ENGINE] Collection file doesn't exist: /test/milvus/db_data_080/sift_1m_128_128_l2/db/tables/sift_1m_128_128_l2/1592661205816766000/1592661205816766000 in path: /test/milvus/db_data_080/sift_1m_128_128_l2/db for collection: sift_1m_128_128_l2
2020-06-20 22:01:02,310 | ERROR | default | [ENGINE] Failed to open file: /test/milvus/db_data_080/sift_1m_128_128_l2/db/tables/sift_1m_128_128_l2/1592661204384909000/deleted_docs, error: No such file or directory
2020-06-20 22:01:02,310 | ERROR | default | [ENGINE] Failed to load segment from /test/milvus/db_data_080/sift_1m_128_128_l2/db/tables/sift_1m_128_128_l2/1592661204384909000/1592661204384909000
2020-06-20 22:01:02,311 | ERROR | default | [ENGINE] Failed to load segment from
2020-06-20 22:01:02,366 | ERROR | default | [ENGINE] Collection file doesn't exist: /test/milvus/db_data_080/sift_1m_128_128_l2/db/tables/sift_1m_128_128_l2/1592661205816766000/1592661205816766000 in path: /test/milvus/db_data_080/sift_1m_128_128_l2/db for collection: sift_1m_128_128_l2
2020-06-20 22:01:02,367 | ERROR | default | [ENGINE] Failed to open file: /test/milvus/db_data_080/sift_1m_128_128_l2/db/tables/sift_1m_128_128_l2/1592661204384909000/deleted_docs, error: No such file or directory
2020-06-20 22:01:02,367 | ERROR | default | [ENGINE] Failed to load segment from /test/milvus/db_data_080/sift_1m_128_128_l2/db/tables/sift_1m_128_128_l2/1592661204384909000/1592661204384909000
2020-06-20 22:01:02,367 | ERROR | default | [ENGINE] Failed to load segment from
2020-06-20 22:01:02,419 | ERROR | default | [ENGINE] Failed to build index 1592661662311016000, reason: Resource deadlock avoided

debug logs:

2020-06-20 22:01:02,367 | DEBUG | default | [ENGINE] Index params: {"dim":128,"gpu_id":0,"metric_type":"L2"}
2020-06-20 22:01:02,367 | DEBUG | default | [SERVER] BuildIndexJob 2 finish index file: 36
2020-06-20 22:01:02,367 | DEBUG | default | [SERVER] cpu load BuildIndexTask
2020-06-20 22:01:02,367 | DEBUG | default | [SERVER] BuildIndexJob 2 all done
2020-06-20 22:01:02,367 | DEBUG | default | [ENGINE] Building index job 2 succeed.
2020-06-20 22:01:02,367 | DEBUG | default | [ENGINE] Unmark ongoing file:1592661204384909000 refcount:0
2020-06-20 22:01:02,367 | DEBUG | default | [ENGINE] Finish build index file 1592661204384909000
2020-06-20 22:01:02,367 | DEBUG | default | [ENGINE] Index params: {"dim":128,"gpu_id":0,"metric_type":"L2"}
2020-06-20 22:01:02,368 | DEBUG | default | [SERVER] BuildIndexJob 3 finish index file: 37
2020-06-20 22:01:02,368 | DEBUG | default | [SERVER] BuildIndexJob 3 all done
2020-06-20 22:01:02,368 | DEBUG | default | [ENGINE] Building index job 3 succeed.
2020-06-20 22:01:02,368 | DEBUG | default | [ENGINE] Unmark ongoing file:1592661205816766000 refcount:0
2020-06-20 22:01:02,368 | DEBUG | default | [ENGINE] Finish build index file 1592661205816766000
2020-06-20 22:01:02,368 | DEBUG | default | [ENGINE] Background build index thread finished
2020-06-20 22:01:02,368 | DEBUG | default | [ENGINE] DB background thread exit
2020-06-20 22:01:02,369 | DEBUG | default | [ENGINE] Remove collection file type as NEW
2020-06-20 22:01:02,370 | DEBUG | default | [ENGINE] Clean 1 files
2020-06-20 22:01:02,418 | DEBUG | default | [ENGINE] DB background metric thread exit
2020-06-20 22:01:02,419 | DEBUG | default | [ENGINE] Update single collection file, file id = 1592661662311016000
2020-06-20 22:01:02,419 | DEBUG | default | [SERVER] BuildIndexJob 0 finish index file: 0
2020-06-20 22:01:02,419 | DEBUG | default | [SERVER] XBuildIndexTask::Execute 0: totally cost (0.108663 second [108.663442 ms])
2020-06-20 22:01:02,419 | DEBUG | default | [SERVER] XBuildIndexTask::Execute 0: totally cost (0.000003 second [0.003230 ms])
2020-06-20 22:01:02,419 | DEBUG | default | [SERVER] XBuildIndexTask::Execute 0: totally cost (0.000003 second [0.002880 ms])
2020-06-20 22:01:02,419 | DEBUG | default | [SERVER] XBuildIndexTask::Execute 0: totally cost (0.000003 second [0.002614 ms])

Steps/Code to reproduce behavior

  1. create collection and insert sift-1m into it, flush
  2. clean up the container.
  3. start test with new container, and build index with ivf_flat
    Create index failed caused by server crashed
    data/log path on NAS: /test/milvus/db_data_080/sift_1m_128_128_l2
    Expected behavior

Environment details
0.8.1-cpu
commit id:

registry.zilliz.com/milvus/engine                    0.8.1-cpu-centos7-release         51e4c23dbd63

Screenshots
If applicable, add screenshots to help explain your problem.

Additional context
Add any other context about the problem here.

Metadata

Assignees

Labels

kind/bugIssues or changes related a bugseverity/criticalCritical, lead to crash, data missing, wrong result, function totally doesn't work.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions