Closed
Description
We had an instance where a shard is stuck during recovery (in initializing state) due to peer node connectivity issues during recovery. When this occurs with concurrent indexing and searching to the cluster, all the client nodes started to run out of memory (adding a new client node does not help for that new client node will also end up running out of memory). We collected the heap dump on OOM and it looks like a lot of heap is used in the netty buffer which doesn't seem right.