Skip to content

[Data Tiers] Add telemetry enhancements for data tiers utilization #71204

Open
@jethr0null

Description

@jethr0null

Telemetry was added for data tiers in this pr.

Currently collected data:

node_count :: number of nodes with this tier/role
index_count :: number of indices on this tier
total_shard_count :: total number of shards for all nodes in this tier
primary_shard_count :: number of primary shards for all nodes in this tier
doc_count :: number of documents for all nodes in this tier
total_size_bytes :: total number of bytes for all shards for all nodes in this tier
primary_size_bytes :: number of bytes for all primary shards on all nodes in this tier
primary_shard_size_avg_bytes :: average shard size for primary shard in this tier
primary_shard_size_median_bytes :: median shard size for primary shard in this tier
primary_shard_size_mad_bytes :: median absolute deviation of shard size for primary shard in this tier

Challenges with the current data:

The existing telemetry does not enable us to distinguish actual utilization and will wind up reporting things like index_count in multiple tiers if the node is tagged with multiple node roles. In order to be able to accurately report on the actual utilization of each tier, we need to add telemetry which would associate these fields with the role that the data is currently associated with.

For example, I would expect something like the following query of our telemetry data should accurately return only data that is “actively associated” with the warm tier: stack_stats.xpack.data_tiers.data_warm.index_count > 1

A concrete example of how this data will be used is to report on and visualize the number of unique clusters that have data residing on a given tier (the ability to drill down into more detailed stats such as the doc_count or index_count for the data residing on each tier would also be useful).

It would also be useful to be able to distinguish whether the tier an index is located on matches its first preference (index.routing.allocation.include._tier_preference). So for example, an index might specify cold as its first preference but if no cold nodes are available it could reside on its tier of second preference (say warm). We could use this distinction to suggest actions to the user such as scaling or enabling autoscaling.

cc @dakrone @sajjadwahmed

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions