-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Description
The GET _stats API broadcasts requests to all nodes in order to collect shard level stats from across the nodes. Now if there is a single node that is problematic(degraded hardware or the kernel unable to schedule tasks during some scenarios, cpu lock-ups etc), this can cause heap to build up on a node handling the REST request as it would not be able to free up memory allocated from the responses of remaining nodes while waiting on the problematic node to respond. Now if there are clients doing a periodic monitoring this might increase GC pressure on the nodes.
Histogram dum from one of the nodes
num #instances #bytes class name
----------------------------------------------
1: 230117683 14813806384 [C
2: 230114240 5522741760 java.lang.String
3: 77239917 2471677344 java.util.HashMap$Node
4: 11036035 889480064 [Ljava.util.HashMap$Node;
5: 10886249 870899920 org.elasticsearch.action.admin.indices.stats.CommonStats
6: 10944341 700437824 org.elasticsearch.cluster.routing.ShardRouting
7: 21838600 698835200 java.util.Collections$UnmodifiableMap
8: 10888927 609779912 java.util.LinkedHashMap
9: 11072686 531488928 java.util.HashMap
10: 10885985 522527280 org.elasticsearch.action.admin.indices.stats.ShardStats
11: 10885985 435439400 org.elasticsearch.index.seqno.SeqNoStats
12: 30990 392283384 [B
13: 10885986 348351552 org.elasticsearch.index.seqno.RetentionLeases
14: 10885985 348351520 org.elasticsearch.index.engine.CommitStats
15: 11002724 264065376 java.util.Collections$SingletonList
16: 11002472 264059328 org.elasticsearch.index.Index
17: 10972905 263349720 org.elasticsearch.index.shard.ShardId
18: 10944339 262664136 org.elasticsearch.cluster.routing.AllocationId
19: 5474469 218978760 org.elasticsearch.index.shard.DocsStats
20: 10885985 174175760 org.elasticsearch.index.seqno.RetentionLeaseStats
21: 5469798 131275152 org.elasticsearch.index.store.StoreStats
This can be easily reproduced by placing some sleep on TransportBroadcastByNodeAction$BroadcastByNodeTransportRequestHandler#messageReceived and invoking the REST _stats API periodically