Closed
Description
Is there an existing issue for this?
- I have searched the existing issues
Environment
- Milvus version:master-20250213-dccba87f-amd64
- Deployment mode(standalone or cluster):cluster
- MQ type(rocksmq, pulsar or kafka):
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS):
- CPU/Memory:
- GPU:
- Others:
Current Behavior
[2025/02/14 01:27:06.023 +00:00] [INFO] [flusherimpl/wal_flusher.go:128] ["data coord client ready"] [module=streamingnode] [component=flusher] [pchannel=by-dev-rootcoord-dml_0]
[2025/02/14 01:27:06.023 +00:00] [INFO] [syncmgr/sync_manager.go:70] ["sync manager initialized"] [initPoolSize=256]
[2025/02/14 01:27:06.023 +00:00] [INFO] [flusherimpl/wal_flusher.go:121] ["fetch recovery info done"] [module=streamingnode] [component=flusher] [pchannel=by-dev-rootcoord-dml_7] [recoveryInfoNum=3]
[2025/02/14 01:27:06.023 +00:00] [INFO] [flusherimpl/wal_flusher.go:128] ["data coord client ready"] [module=streamingnode] [component=flusher] [pchannel=by-dev-rootcoord-dml_7]
[2025/02/14 01:27:06.023 +00:00] [INFO] [syncmgr/sync_manager.go:70] ["sync manager initialized"] [initPoolSize=256]
[2025/02/14 01:27:06.023 +00:00] [INFO] [flusherimpl/wal_flusher.go:121] ["fetch recovery info done"] [module=streamingnode] [component=flusher] [pchannel=by-dev-rootcoord-dml_10] [recoveryInfoNum=3]
[2025/02/14 01:27:06.023 +00:00] [INFO] [flusherimpl/wal_flusher.go:128] ["data coord client ready"] [module=streamingnode] [component=flusher] [pchannel=by-dev-rootcoord-dml_10]
[2025/02/14 01:27:06.024 +00:00] [INFO] [syncmgr/sync_manager.go:70] ["sync manager initialized"] [initPoolSize=256]
SIGNAL CATCH BY NON-GO SIGNAL HANDLER
SIGNO: 11; SIGNAME: Segmentation fault; SI_CODE: 1; SI_ADDR: 0x28
BACKTRACE:
I20250214 01:27:08.028440 83 MinioChunkManager.cpp:225] [SERVER][PreCheck][milvus][]start to precheck chunk manager with configuration: [address=mixcoord-pod-kill-20295-minio:9000, bucket_name=milvus-bucket, root_path=file, storage_type=remote, cloud_provider=aws, iam_endpoint=, log_level=fatal, region=, useSSL=false, sslCACert=19, useIAM=false, useVirtualHost=false, requestTimeoutMs=10000, gcp_native_without_auth=false]
I20250214 01:27:08.036345 83 ChunkManager.cpp:112] [SERVER][AwsChunkManager][milvus][]init AwsChunkManager with parameter[endpoint=mixcoord-pod-kill-20295-minio:9000][bucket_name=milvus-bucket][root_path=file][use_secure=false]
github.com/milvus-io/milvus/internal/streamingnode/server/flusher/flusherimpl.recoverPChannelCheckpointManager
/workspace/source/internal/streamingnode/server/flusher/flusherimpl/pchannel_checkpoint.go:34 pc=0x60d4592
[2025/02/14 01:27:10.861 +00:00] [WARN] [etcd/etcd_kv.go:663] ["Slow etcd operation load"] ["time spent"=4.839121788s] [key=by-dev/meta/streamingnode-meta/wal/by-dev-rootcoord-dml_2/consume-checkpoint]
SIGNAL CATCH BY NON-GO SIGNAL HANDLER
SIGNO: 11; SIGNAME: Segmentation fault; SI_CODE: 1; SI_ADDR: 0x28
BACKTRACE:
github.com/milvus-io/milvus/internal/streamingnode/server/flusher/flusherimpl.recoverPChannelCheckpointManager
/workspace/source/internal/streamingnode/server/flusher/flusherimpl/pchannel_checkpoint.go:34 pc=0x60d4592
[2025/02/14 01:27:10.862 +00:00] [INFO] [flusherimpl/wal_flusher.go:52] ["wal flusher stop"] [module=streamingnode] [component=flusher] [pchannel=by-dev-rootcoord-dml_12]
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x28 pc=0x60d4592]
[2025-02-14T01:14:47.881Z] + kubectl get pods -o wide
[2025-02-14T01:14:47.883Z] + grep mixcoord-pod-kill-20295
[2025-02-14T01:14:47.883Z] mixcoord-pod-kill-20295-etcd-0 1/1 Running 0 37m 10.104.26.132 4am-node32 <none> <none>
[2025-02-14T01:14:47.883Z] mixcoord-pod-kill-20295-etcd-1 1/1 Running 0 37m 10.104.19.162 4am-node28 <none> <none>
[2025-02-14T01:14:47.883Z] mixcoord-pod-kill-20295-etcd-2 1/1 Running 0 37m 10.104.24.46 4am-node29 <none> <none>
[2025-02-14T01:14:47.883Z] mixcoord-pod-kill-20295-milvus-datanode-694587bffd-5cj57 1/1 Running 2 (37m ago) 37m 10.104.23.175 4am-node27 <none> <none>
[2025-02-14T01:14:47.883Z] mixcoord-pod-kill-20295-milvus-datanode-694587bffd-sgcd8 1/1 Running 2 (37m ago) 37m 10.104.16.195 4am-node21 <none> <none>
[2025-02-14T01:14:47.883Z] mixcoord-pod-kill-20295-milvus-indexnode-7c4fc4f765-5vt2j 1/1 Running 2 (37m ago) 37m 10.104.17.48 4am-node23 <none> <none>
[2025-02-14T01:14:47.883Z] mixcoord-pod-kill-20295-milvus-indexnode-7c4fc4f765-62994 1/1 Running 2 (37m ago) 37m 10.104.23.176 4am-node27 <none> <none>
[2025-02-14T01:14:47.883Z] mixcoord-pod-kill-20295-milvus-indexnode-7c4fc4f765-n7wtg 1/1 Running 2 (37m ago) 37m 10.104.30.98 4am-node38 <none> <none>
[2025-02-14T01:14:47.883Z] mixcoord-pod-kill-20295-milvus-mixcoord-7f9979c575-xwqbz 1/1 Running 2 (37m ago) 37m 10.104.30.93 4am-node38 <none> <none>
[2025-02-14T01:14:47.883Z] mixcoord-pod-kill-20295-milvus-proxy-7784c5c47-kmvrs 1/1 Running 2 (37m ago) 37m 10.104.21.172 4am-node24 <none> <none>
[2025-02-14T01:14:47.884Z] mixcoord-pod-kill-20295-milvus-querynode-bd884bdf5-52kx2 1/1 Running 2 (37m ago) 37m 10.104.17.47 4am-node23 <none> <none>
[2025-02-14T01:14:47.884Z] mixcoord-pod-kill-20295-milvus-querynode-bd884bdf5-djmnh 1/1 Running 2 (37m ago) 37m 10.104.25.167 4am-node30 <none> <none>
[2025-02-14T01:14:47.884Z] mixcoord-pod-kill-20295-milvus-querynode-bd884bdf5-wj2m8 1/1 Running 2 (37m ago) 37m 10.104.21.171 4am-node24 <none> <none>
[2025-02-14T01:14:47.884Z] mixcoord-pod-kill-20295-milvus-streamingnode-7c756d75b5-s8whr 0/1 CrashLoopBackOff 11 (3m18s ago) 37m 10.104.23.174 4am-node27 <none> <none>
[2025-02-14T01:14:47.884Z] mixcoord-pod-kill-20295-minio-0 1/1 Running 0 37m 10.104.26.133 4am-node32 <none> <none>
[2025-02-14T01:14:47.884Z] mixcoord-pod-kill-20295-minio-1 1/1 Running 0 37m 10.104.19.163 4am-node28 <none> <none>
[2025-02-14T01:14:47.884Z] mixcoord-pod-kill-20295-minio-2 1/1 Running 0 37m 10.104.33.91 4am-node36 <none> <none>
[2025-02-14T01:14:47.884Z] mixcoord-pod-kill-20295-minio-3 1/1 Running 0 37m 10.104.15.101 4am-node20 <none> <none>
[2025-02-14T01:14:47.884Z] mixcoord-pod-kill-20295-pulsarv3-bookie-0 1/1 Running 0 37m 10.104.26.131 4am-node32 <none> <none>
[2025-02-14T01:14:47.884Z] mixcoord-pod-kill-20295-pulsarv3-bookie-1 1/1 Running 0 37m 10.104.24.43 4am-node29 <none> <none>
[2025-02-14T01:14:47.884Z] mixcoord-pod-kill-20295-pulsarv3-bookie-2 1/1 Running 0 37m 10.104.15.100 4am-node20 <none> <none>
[2025-02-14T01:14:47.884Z] mixcoord-pod-kill-20295-pulsarv3-bookie-init-h4sgm 0/1 Completed 0 37m 10.104.14.236 4am-node18 <none> <none>
[2025-02-14T01:14:47.884Z] mixcoord-pod-kill-20295-pulsarv3-broker-0 1/1 Running 0 37m 10.104.14.238 4am-node18 <none> <none>
[2025-02-14T01:14:47.884Z] mixcoord-pod-kill-20295-pulsarv3-broker-1 1/1 Running 0 37m 10.104.26.124 4am-node32 <none> <none>
[2025-02-14T01:14:47.884Z] mixcoord-pod-kill-20295-pulsarv3-proxy-0 1/1 Running 0 37m 10.104.26.123 4am-node32 <none> <none>
[2025-02-14T01:14:47.884Z] mixcoord-pod-kill-20295-pulsarv3-proxy-1 1/1 Running 0 37m 10.104.14.237 4am-node18 <none> <none>
[2025-02-14T01:14:47.884Z] mixcoord-pod-kill-20295-pulsarv3-pulsar-init-ztslb 0/1 Completed 0 37m 10.104.14.232 4am-node18 <none> <none>
[2025-02-14T01:14:47.884Z] mixcoord-pod-kill-20295-pulsarv3-recovery-0 1/1 Running 0 37m 10.104.14.230 4am-node18 <none> <none>
[2025-02-14T01:14:47.884Z] mixcoord-pod-kill-20295-pulsarv3-zookeeper-0 1/1 Running 0 37m 10.104.19.160 4am-node28 <none> <none>
[2025-02-14T01:14:47.884Z] mixcoord-pod-kill-20295-pulsarv3-zookeeper-1 1/1 Running 0 37m 10.104.26.130 4am-node32 <none> <none>
[2025-02-14T01:14:47.884Z] mixcoord-pod-kill-20295-pulsarv3-zookeeper-2 1/1 Running 0 37m 10.104.24.44 4am-node29 <none> <none>
Expected Behavior
No response
Steps To Reproduce
Milvus Log
failed job: https://qa-jenkins.milvus.io/blue/organizations/jenkins/chaos-test-cron/detail/chaos-test-cron/20295/pipeline
log:
artifacts-mixcoord-pod-kill-20295-server-logs.tar.gz
cluster: 4am
ns: chaos-testing
pod info
[2025-02-14T01:14:47.881Z] + kubectl get pods -o wide
[2025-02-14T01:14:47.883Z] + grep mixcoord-pod-kill-20295
[2025-02-14T01:14:47.883Z] mixcoord-pod-kill-20295-etcd-0 1/1 Running 0 37m 10.104.26.132 4am-node32 <none> <none>
[2025-02-14T01:14:47.883Z] mixcoord-pod-kill-20295-etcd-1 1/1 Running 0 37m 10.104.19.162 4am-node28 <none> <none>
[2025-02-14T01:14:47.883Z] mixcoord-pod-kill-20295-etcd-2 1/1 Running 0 37m 10.104.24.46 4am-node29 <none> <none>
[2025-02-14T01:14:47.883Z] mixcoord-pod-kill-20295-milvus-datanode-694587bffd-5cj57 1/1 Running 2 (37m ago) 37m 10.104.23.175 4am-node27 <none> <none>
[2025-02-14T01:14:47.883Z] mixcoord-pod-kill-20295-milvus-datanode-694587bffd-sgcd8 1/1 Running 2 (37m ago) 37m 10.104.16.195 4am-node21 <none> <none>
[2025-02-14T01:14:47.883Z] mixcoord-pod-kill-20295-milvus-indexnode-7c4fc4f765-5vt2j 1/1 Running 2 (37m ago) 37m 10.104.17.48 4am-node23 <none> <none>
[2025-02-14T01:14:47.883Z] mixcoord-pod-kill-20295-milvus-indexnode-7c4fc4f765-62994 1/1 Running 2 (37m ago) 37m 10.104.23.176 4am-node27 <none> <none>
[2025-02-14T01:14:47.883Z] mixcoord-pod-kill-20295-milvus-indexnode-7c4fc4f765-n7wtg 1/1 Running 2 (37m ago) 37m 10.104.30.98 4am-node38 <none> <none>
[2025-02-14T01:14:47.883Z] mixcoord-pod-kill-20295-milvus-mixcoord-7f9979c575-xwqbz 1/1 Running 2 (37m ago) 37m 10.104.30.93 4am-node38 <none> <none>
[2025-02-14T01:14:47.883Z] mixcoord-pod-kill-20295-milvus-proxy-7784c5c47-kmvrs 1/1 Running 2 (37m ago) 37m 10.104.21.172 4am-node24 <none> <none>
[2025-02-14T01:14:47.884Z] mixcoord-pod-kill-20295-milvus-querynode-bd884bdf5-52kx2 1/1 Running 2 (37m ago) 37m 10.104.17.47 4am-node23 <none> <none>
[2025-02-14T01:14:47.884Z] mixcoord-pod-kill-20295-milvus-querynode-bd884bdf5-djmnh 1/1 Running 2 (37m ago) 37m 10.104.25.167 4am-node30 <none> <none>
[2025-02-14T01:14:47.884Z] mixcoord-pod-kill-20295-milvus-querynode-bd884bdf5-wj2m8 1/1 Running 2 (37m ago) 37m 10.104.21.171 4am-node24 <none> <none>
[2025-02-14T01:14:47.884Z] mixcoord-pod-kill-20295-milvus-streamingnode-7c756d75b5-s8whr 0/1 CrashLoopBackOff 11 (3m18s ago) 37m 10.104.23.174 4am-node27 <none> <none>
[2025-02-14T01:14:47.884Z] mixcoord-pod-kill-20295-minio-0 1/1 Running 0 37m 10.104.26.133 4am-node32 <none> <none>
[2025-02-14T01:14:47.884Z] mixcoord-pod-kill-20295-minio-1 1/1 Running 0 37m 10.104.19.163 4am-node28 <none> <none>
[2025-02-14T01:14:47.884Z] mixcoord-pod-kill-20295-minio-2 1/1 Running 0 37m 10.104.33.91 4am-node36 <none> <none>
[2025-02-14T01:14:47.884Z] mixcoord-pod-kill-20295-minio-3 1/1 Running 0 37m 10.104.15.101 4am-node20 <none> <none>
[2025-02-14T01:14:47.884Z] mixcoord-pod-kill-20295-pulsarv3-bookie-0 1/1 Running 0 37m 10.104.26.131 4am-node32 <none> <none>
[2025-02-14T01:14:47.884Z] mixcoord-pod-kill-20295-pulsarv3-bookie-1 1/1 Running 0 37m 10.104.24.43 4am-node29 <none> <none>
[2025-02-14T01:14:47.884Z] mixcoord-pod-kill-20295-pulsarv3-bookie-2 1/1 Running 0 37m 10.104.15.100 4am-node20 <none> <none>
[2025-02-14T01:14:47.884Z] mixcoord-pod-kill-20295-pulsarv3-bookie-init-h4sgm 0/1 Completed 0 37m 10.104.14.236 4am-node18 <none> <none>
[2025-02-14T01:14:47.884Z] mixcoord-pod-kill-20295-pulsarv3-broker-0 1/1 Running 0 37m 10.104.14.238 4am-node18 <none> <none>
[2025-02-14T01:14:47.884Z] mixcoord-pod-kill-20295-pulsarv3-broker-1 1/1 Running 0 37m 10.104.26.124 4am-node32 <none> <none>
[2025-02-14T01:14:47.884Z] mixcoord-pod-kill-20295-pulsarv3-proxy-0 1/1 Running 0 37m 10.104.26.123 4am-node32 <none> <none>
[2025-02-14T01:14:47.884Z] mixcoord-pod-kill-20295-pulsarv3-proxy-1 1/1 Running 0 37m 10.104.14.237 4am-node18 <none> <none>
[2025-02-14T01:14:47.884Z] mixcoord-pod-kill-20295-pulsarv3-pulsar-init-ztslb 0/1 Completed 0 37m 10.104.14.232 4am-node18 <none> <none>
[2025-02-14T01:14:47.884Z] mixcoord-pod-kill-20295-pulsarv3-recovery-0 1/1 Running 0 37m 10.104.14.230 4am-node18 <none> <none>
[2025-02-14T01:14:47.884Z] mixcoord-pod-kill-20295-pulsarv3-zookeeper-0 1/1 Running 0 37m 10.104.19.160 4am-node28 <none> <none>
[2025-02-14T01:14:47.884Z] mixcoord-pod-kill-20295-pulsarv3-zookeeper-1 1/1 Running 0 37m 10.104.26.130 4am-node32 <none> <none>
[2025-02-14T01:14:47.884Z] mixcoord-pod-kill-20295-pulsarv3-zookeeper-2 1/1 Running 0 37m 10.104.24.44 4am-node29 <none> <none>
Anything else?
No response