Description
openedon Dec 16, 2019
Describe the feature:
Elasticsearch version (bin/elasticsearch --6.3.2
):
Plugins installed: [repository-hdfs]
JVM version (java -version
):10.0.2
OS version (uname -a
if on a Unix-like system):centos7.2
Description of the problem including expected versus actual behavior:
our cluster is 24 data nodes with 31G heap and 1.7T*4 ssd disk , 3 master with 8G heap ,about 8000tps for write. we rolling upgrade(6.3.2 to 6.3.2) the cluster as the following steps:
①set allocation to none
②restart the data node
③set allocation to all
wait for the health from yellow to green.
When i finished upgrade part of the data nodes,i had waited a while,i found the master node old gc and run OOM later.
So I loaded heap dump from master node into Eclipse MemoryAnalyzer and found that 87.57% of memory is used by TransportService.clientHandlers hash map,
most of the RequestHolder was consist of like the action:indices:monitor/stats[n]、indices:monitor/recovery[n] or cluster:monitor/stats[n],below is the pic of heap dump:
and i use the OQL
SELECT toString(action) FROM org.elasticsearch.transport.TransportService$RequestHolder
to statistic the action,result is as follows:
So,are there any bugs here or master overload?