Description
Today DataTiersUsageTransportAction
executes an internal nodes stats action with all the trimmings:
In a large cluster this implementation may need hundreds of MiB of heap on the coordinating node to hold onto every statistic about every shard on every node (several kiB per shard) even though we use almost none of them. Worse, the coordinating node is always the elected master because that's how XPackUsageFeatureTransportAction
derivatives work. It also burns a bunch of CPU and network bandwidth just transporting these stats around the cluster. AFAICT we could push this computation out to the individual nodes with a dedicated TransportNodesAction
which computes the tiny TierSpecificStats
on each node in a manner that allows the coordinating node to combine them.
It also does not propagate cancellation down to the nodes stats task (addressed in #100253)
It also captures the cluster state when it's initiated and retains it until completion, which can represent another 100MiB+ of heap usage.
Relates #77466.