Skip to content

Support for timeout in stats API #52616

@Bukhtawar

Description

@Bukhtawar

The GET _stats API broadcasts requests to all nodes in order to collect shard level stats from across the nodes. Now if there is a single node that is problematic(degraded hardware or the kernel unable to schedule tasks during some scenarios, cpu lock-ups etc), this can cause heap to build up on a node handling the REST request as it would not be able to free up memory allocated from the responses of remaining nodes while waiting on the problematic node to respond. Now if there are clients doing a periodic monitoring this might increase GC pressure on the nodes.

Histogram dum from one of the nodes

 num     #instances         #bytes  class name
----------------------------------------------
   1:     230117683    14813806384  [C
   2:     230114240     5522741760  java.lang.String
   3:      77239917     2471677344  java.util.HashMap$Node
   4:      11036035      889480064  [Ljava.util.HashMap$Node;
   5:      10886249      870899920  org.elasticsearch.action.admin.indices.stats.CommonStats
   6:      10944341      700437824  org.elasticsearch.cluster.routing.ShardRouting
   7:      21838600      698835200  java.util.Collections$UnmodifiableMap
   8:      10888927      609779912  java.util.LinkedHashMap
   9:      11072686      531488928  java.util.HashMap
  10:      10885985      522527280  org.elasticsearch.action.admin.indices.stats.ShardStats
  11:      10885985      435439400  org.elasticsearch.index.seqno.SeqNoStats
  12:         30990      392283384  [B
  13:      10885986      348351552  org.elasticsearch.index.seqno.RetentionLeases
  14:      10885985      348351520  org.elasticsearch.index.engine.CommitStats
  15:      11002724      264065376  java.util.Collections$SingletonList
  16:      11002472      264059328  org.elasticsearch.index.Index
  17:      10972905      263349720  org.elasticsearch.index.shard.ShardId
  18:      10944339      262664136  org.elasticsearch.cluster.routing.AllocationId
  19:       5474469      218978760  org.elasticsearch.index.shard.DocsStats
  20:      10885985      174175760  org.elasticsearch.index.seqno.RetentionLeaseStats
  21:       5469798      131275152  org.elasticsearch.index.store.StoreStats

This can be easily reproduced by placing some sleep on TransportBroadcastByNodeAction$BroadcastByNodeTransportRequestHandler#messageReceived and invoking the REST _stats API periodically

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions