Description
Elasticsearch version : 7.10
JVM version (java -version
): JDK 11
Description of the problem including expected versus actual behavior:
In #77991 we solved asnyc shard fetch resposes memory consumption issue.
But we found async shard fetch reqeusts also consume lots of heap memories. Here is our production env for this exception case:
Data nodes number: 75
Dedicate master nodes number: 3
Master node resource: 2 Core cpus, 8GB physical memory, 4GB heap memory.
Total shards number: 15000
When the new master has been elected after full cluster restart, the elected master heap memory would be used up for several seconds. We dump the memory and found netty inflight sending request used lots of heap:
Each WriteOperation
should be single shard request to specific node (16k buffer size per each):
From Netty4MessageChannelHandler
class we could see a queuedWrites
, messages are flushed asynchronously:
So besides cutting fetch shard response, we also need to handle massive shard sending requests.