Open
Description
Please check the FAQ documentation before raising an issue
Describe the bug (required)
Your Environments (required)
- OS:
uname -a
- Compiler:
g++ --version
orclang++ --version
- CPU:
lscpu
- Commit id (e.g.
a3ffc7d8
)
How To Reproduce(required)
Steps to reproduce the behavior:
版本号3.2.1版本,且space是3副本,机器配置104C 300+内存,cpu和内存未观测到存在瓶颈,4块800G SSD磁盘,数据量20G左右
- 通过flink-connector进行数据写入,batch size =100,写入一段时间后graph日志显示RPC超时:
StorageClientBase-inl.h.ext: Request to ip:9779 time out : TTransportException: Timed out
There some RPC errors: RPC failure in storageClient with without :: TTransportException: time out
InsertVerticesExecutor failed, error E_PRC_FAILURE, part 1
InsertVerticesExecutor failed, error E_PRC_FAILURE, part 2
InsertVerticesExecutor failed, error E_PRC_FAILURE, part 3 - 查询对应的storage日志:
RaftPart.cpp:1033 Replicating log timed out : replicateLogLatencyUs 10001168
RaftPart.cpp:1033 Replicating log timed out : replicateLogLatencyUs 10000230
RaftPart.cpp:1033 Replicating log timed out : replicateLogLatencyUs 10001245
RaftPart.cpp:1033 Replicating log timed out : replicateLogLatencyUs 10001037
RaftPart.cpp:1033 Replicating log timed out : replicateLogLatencyUs 10001223
......... - 如上storage日志持续打印7个小时且未恢复正常,节点处于offline状态一直未恢复
Expected behavior
1、想请问下上述情况发生可能存在哪些原因
2、节点应该如何恢复
3、单个节点offline,再提交任务为何还是写入失败,其他2个副本均正常
Additional context