Skip to content

Massive async shard fetch requests consume lots of heap memories on master node. #80694

Open
@howardhuanghua

Description

@howardhuanghua

Elasticsearch version : 7.10

JVM version (java -version): JDK 11

Description of the problem including expected versus actual behavior:
In #77991 we solved asnyc shard fetch resposes memory consumption issue.
But we found async shard fetch reqeusts also consume lots of heap memories. Here is our production env for this exception case:
Data nodes number: 75
Dedicate master nodes number: 3
Master node resource: 2 Core cpus, 8GB physical memory, 4GB heap memory.
Total shards number: 15000

When the new master has been elected after full cluster restart, the elected master heap memory would be used up for several seconds. We dump the memory and found netty inflight sending request used lots of heap:
image

Each WriteOperation should be single shard request to specific node (16k buffer size per each):
企业微信截图_16367325062007

From Netty4MessageChannelHandler class we could see a queuedWrites, messages are flushed asynchronously:

private final Queue<WriteOperation> queuedWrites = new ArrayDeque<>();

So besides cutting fetch shard response, we also need to handle massive shard sending requests.

Metadata

Metadata

Assignees

No one assigned

    Labels

    :Distributed Coordination/AllocationAll issues relating to the decision making around placing a shard (both master logic & on the nodes)>bugTeam:Distributed (Obsolete)Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions