Skip to content

OOM due to large number of requests in TransportService.clientHandlers #50241

Closed

Description

Describe the feature:

Elasticsearch version (bin/elasticsearch --6.3.2):

Plugins installed: [repository-hdfs]

JVM version (java -version):10.0.2

OS version (uname -a if on a Unix-like system):centos7.2

Description of the problem including expected versus actual behavior:
our cluster is 24 data nodes with 31G heap and 1.7T*4 ssd disk , 3 master with 8G heap ,about 8000tps for write. we rolling upgrade(6.3.2 to 6.3.2) the cluster as the following steps:
①set allocation to none
②restart the data node
③set allocation to all
wait for the health from yellow to green.
When i finished upgrade part of the data nodes,i had waited a while,i found the master node old gc and run OOM later.
So I loaded heap dump from master node into Eclipse MemoryAnalyzer and found that 87.57% of memory is used by TransportService.clientHandlers hash map,
most of the RequestHolder was consist of like the action:indices:monitor/stats[n]、indices:monitor/recovery[n] or cluster:monitor/stats[n],below is the pic of heap dump:
client
1
2

and i use the OQL
SELECT toString(action) FROM org.elasticsearch.transport.TransportService$RequestHolder
to statistic the action,result is as follows:

action

So,are there any bugs here or master overload?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions