Skip to content

Move the calculation of data tier usage stats to individual nodes (#100230) #101599

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

gmarouli
Copy link
Contributor

@gmarouli gmarouli commented Oct 31, 2023

Current situation
DataTiersUsageTransportAction executes an internal nodes stats action with all the trimmings:

 client.admin() 
     .cluster() 
     .prepareNodesStats() 
     .all() 
     .setIndices(CommonStatsFlags.ALL) 

This puts a lot of memory pressure to the coordinating node (which in this case is always the elected master) that can cause further instabilities.

Proposed solution

We could trim down the data we need since we only care about docs and store per shard, that would reduce what needs to be kept in memory; however, with the optimisations of the many shards project, we would like to make it even more light weight.

We chose to do this and to push some parts of the calculation to the nodes themselves. Namely, each node sends to the elected master, grouped per preferred tier:

  • A set of all indices residing in this node, so it can be used to count them.
  • A list of all the primary shard sizes residing in this nodes, so we can calculate statistics on them.
  • The count of total shards on the node.
  • The count of total docs on the node.
  • The total store size.

The elected master then collects the data and aggregates them to one response.

Fixes: #100230

@gmarouli
Copy link
Contributor Author

gmarouli commented Nov 1, 2023

@elasticmachine update branch

@gmarouli gmarouli added the >bug label Nov 1, 2023
@elasticsearchmachine
Copy link
Collaborator

Hi @gmarouli, I've created a changelog YAML for you.

@gmarouli gmarouli marked this pull request as ready for review November 1, 2023 11:42
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-data-management (Team:Data Management)

@elasticsearchmachine elasticsearchmachine added the Team:Data Management Meta label for data/management team label Nov 1, 2023
@elasticsearchmachine
Copy link
Collaborator

Hi @gmarouli, I've updated the changelog YAML for you.

@gmarouli gmarouli changed the title Move the calculation of data tier usage stats to individual nodes Move the calculation of data tier usage stats to individual nodes (#100230) Nov 1, 2023
@gmarouli gmarouli requested a review from andreidan November 1, 2023 11:45
@gmarouli gmarouli added the buildkite-opt-in Opts your PR into Buildkite instead of Jenkins label Nov 1, 2023
@gmarouli
Copy link
Contributor Author

gmarouli commented Nov 1, 2023

@elasticmachine update branch

@elasticsearchmachine
Copy link
Collaborator

Hi @gmarouli, I've updated the changelog YAML for you.

@gmarouli
Copy link
Contributor Author

gmarouli commented Nov 3, 2023

@elasticmachine update branch

@gmarouli
Copy link
Contributor Author

gmarouli commented Nov 3, 2023

@elasticmachine run elasticsearch-ci

/**
* Sources locally data tier usage stats mainly indices and shard sizes grouped by preferred data tier.
*/
public class NodesDataTiersUsageTransportAction extends TransportNodesAction<
Copy link
Contributor

@andreidan andreidan Nov 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this Mary.

I'm probably missing something, but what's the difference between this new transport action and potentially continuing to use the node stats action with an empty set of metrics but with the indices flags configured?

i.e.

  NodesStatsRequest statsRequest = new NodesStatsRequest(nodes);
                statsRequest.clear()
                    .indices(new CommonStatsFlags().clear().set(CommonStatsFlags.Flag.Docs, true).set(CommonStatsFlags.Flag.Store, true));

I'm a bit concerned about the fact that it seems to me we increased the payload size of the messages we send around in the cluster by needing to include the indices names, whilst introducing a new transport action we'll have to maintain. Should we benchmark this proposed solution against the node stats solution where we reduce the amount of data we collect?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying to make the transport action as lean as possible. I thought this was achieving that but a benchmark would definitely help figure out if it's worth it or not.

Copy link
Contributor Author

@gmarouli gmarouli Nov 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am going to list here samples of alternative approaches:

  • Using a trimmed version of node stats, we need only store & docs but we do need them on shard level otherwise they are aggregated per node:
GET _nodes/stats/indices/store,docs?level=shards
{
  "_nodes": {
    "total": 1,
    "successful": 1,
    "failed": 0
  },
  "cluster_name": "elasticsearch",
  "nodes": {
    "Ypq3Z2f_S96Rqj_UMPe2Kg": {
      "timestamp": 1699356609136,
      "name": "Marias-MBP-2.home",
      "transport_address": "127.0.0.1:9300",
      "host": "127.0.0.1",
      "ip": "127.0.0.1:9300",
      "roles": [...],
      "attributes": {...},
      "indices": {
        "docs": {
          "count": 110,
          "deleted": 38
        },
        "store": {
          "size_in_bytes": 2826327,
          "total_data_set_size_in_bytes": 2826327,
          "reserved_in_bytes": 0
        },
        "shards": {
          ".internal.alerts-security.alerts-default-000001": [
            {
              "0": {
                "routing": {
                  "state": "STARTED",
                  "primary": true,
                  "node": "Ypq3Z2f_S96Rqj_UMPe2Kg",
                  "relocating_node": null
                },
                "docs": {
                  "count": 0,
                  "deleted": 0
                },
                "store": {
                  "size_in_bytes": 249,
                  "total_data_set_size_in_bytes": 249,
                  "reserved_in_bytes": 0
                },
                "commit": {
                  "id": "C32+lLS09h+Ak5NNXF+VMQ==",
                  "generation": 3,
                  "user_data": {
                    "local_checkpoint": "-1",
                    "es_version": "8500003",
                    "min_retained_seq_no": "0",
                    "max_seq_no": "-1",
                    "history_uuid": "mwJ-o4Q3QMu5qW0CbQ4HBg",
                    "max_unsafe_auto_id_timestamp": "-1",
                    "translog_uuid": "mUH7cQcaQQmhHOGbhJVOtg"
                  },
                  "num_docs": 0
                },
                "seq_no": {
                  "max_seq_no": -1,
                  "local_checkpoint": -1,
                  "global_checkpoint": -1
                },
                "retention_leases": {
                  "primary_term": 1,
                  "version": 1,
                  "leases": [
                    {
                      "id": "peer_recovery/Ypq3Z2f_S96Rqj_UMPe2Kg",
                      "retaining_seq_no": 0,
                      "timestamp": 1699356264934,
                      "source": "peer recovery"
                    }
                  ]
                },
                "shard_path": {
                  "state_path": "/Users/gmarouli/projects/elasticsearch/build/distribution/local/elasticsearch-8.12.0-SNAPSHOT/data",
                  "data_path": "/Users/gmarouli/projects/elasticsearch/build/distribution/local/elasticsearch-8.12.0-SNAPSHOT/data",
                  "is_custom_data_path": false
                },
                "search_idle": true,
                "search_idle_time": 346455
              }
            }
          ],....

This response contains the information we want but for every shard the footprint is quite big. Benchmark will show exactly how much.

  • Using the indices stats is a bit more compact response but not too much and I was concerned that we might encounter complications because the privileges differ between the usage action and this one:
GET _stats/docs,store?expand_wildcards=all
{
  "_shards": {
    "total": 18,
    "successful": 18,
    "failed": 0
  },
  "_all": {
    "primaries": {
      "docs": {
        "count": 110,
        "deleted": 5
      },
      "store": {
        "size_in_bytes": 2743520,
        "total_data_set_size_in_bytes": 2743520,
        "reserved_in_bytes": 0
      }
    },
    "total": {
      "docs": {
        "count": 110,
        "deleted": 5
      },
      "store": {
        "size_in_bytes": 2743520,
        "total_data_set_size_in_bytes": 2743520,
        "reserved_in_bytes": 0
      }
    }
  },
  "indices": {
    ".kibana_8.9.1_001": {
      "uuid": "aGZzdSIGRD-rG7d8e9guVw",
      "health": "green",
      "status": "open",
      "primaries": {
        "docs": {
          "count": 51,
          "deleted": 0
        },
        "store": {
          "size_in_bytes": 93243,
          "total_data_set_size_in_bytes": 93243,
          "reserved_in_bytes": 0
        }
      },
      "total": {
        "docs": {
          "count": 51,
          "deleted": 0
        },
        "store": {
          "size_in_bytes": 93243,
          "total_data_set_size_in_bytes": 93243,
          "reserved_in_bytes": 0
        }
      },
      "shards": {
        "0": [
          {
            "routing": {
              "state": "STARTED",
              "primary": true,
              "node": "Ypq3Z2f_S96Rqj_UMPe2Kg",
              "relocating_node": null
            },
            "docs": {
              "count": 51,
              "deleted": 0
            },
            "store": {
              "size_in_bytes": 93243,
              "total_data_set_size_in_bytes": 93243,
              "reserved_in_bytes": 0
            },
            "commit": {
              "id": "C32+lLS09h+Ak5NNXF+cfQ==",
              "generation": 13,
              "user_data": {
                "local_checkpoint": "121",
                "es_version": "8500003",
                "min_retained_seq_no": "117",
                "max_seq_no": "121",
                "history_uuid": "fY2gWHtJTGi2mUcY_gnHYg",
                "max_unsafe_auto_id_timestamp": "-1",
                "translog_uuid": "eVTs1OgeTpSKsXt5cf6HSw"
              },
              "num_docs": 51
            },
            "seq_no": {
              "max_seq_no": 125,
              "local_checkpoint": 125,
              "global_checkpoint": 125
            },
            "retention_leases": {
              "primary_term": 1,
              "version": 19,
              "leases": [
                {
                  "id": "peer_recovery/Ypq3Z2f_S96Rqj_UMPe2Kg",
                  "retaining_seq_no": 126,
                  "timestamp": 1699357291431,
                  "source": "peer recovery"
                }
              ]
            },
            "shard_path": {
              "state_path": "/Users/gmarouli/projects/elasticsearch/build/distribution/local/elasticsearch-8.12.0-SNAPSHOT/data",
              "data_path": "/Users/gmarouli/projects/elasticsearch/build/distribution/local/elasticsearch-8.12.0-SNAPSHOT/data",
              "is_custom_data_path": false
            },
            "search_idle": false,
            "search_idle_time": 11957
          }
        ]
      }
    },..

Finally, the new action passes objects like this for every node (+ the node related info in the aggregated response), the main gains are that replicas do not add to the size of the response. The size of the response is influenced only. by the primary shards per node because they influence the indices and the primary shard sizes. We cannot merge these two stats because the master node will calculate certain stats over all primary shards:

{
   "data_hot": {
       "indices": ["index-1",....]
       "primaryShardSizes": [93243, ...]
       "totalShardCount": X,
       "totalShardSize": Y
   },
   "data_frozen": {....}
}

Please feel free to share any initial thoughts. I am going to work on benchmarking now, but it might take a bit because I haven't done this before.

Copy link
Contributor

@andreidan andreidan Nov 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah thanks for publishing the outputs here Mary. This is super helpful.

I think we don't need to do any benchmarking as the reduced node stats response does already contain all the indices names on the node (and the extra stuff we don't need).

I think it's fine to proceed with the new transport action you proposed in this PR as it's quite a big reduction in response size and computation from the reduced node stats and indices stats APIs. 🚀

@gmarouli
Copy link
Contributor Author

gmarouli commented Nov 8, 2023

@elasticmachine update branch

@gmarouli
Copy link
Contributor Author

gmarouli commented Nov 9, 2023

@elasticmachine update branch

@gmarouli gmarouli requested a review from andreidan November 9, 2023 10:55
@gmarouli
Copy link
Contributor Author

@elasticmachine update branch

Copy link
Contributor

@andreidan andreidan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for iterating on this Mary, this is going to be soooo much lighter ! 🚀

.collect(Collectors.toSet());
for (String indexName : localIndices) {
IndexMetadata indexMetadata = metadata.index(indexName);
String tier = indexMetadata.getTierPreference().isEmpty() ? null : indexMetadata.getTierPreference().get(0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should guard against an empty tier_preference.
indexMetadata.getTierPreference().get(0) could throw index out bounds exception

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, I think this is covered by indexMetadata.getTierPreference().isEmpty(), right? Is there an edge case I am missing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is covered, but so I go ahead and merge this, but if I am indeed missing something I will fix it asap.

Copy link
Contributor

@andreidan andreidan Nov 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh you're totally right. my eyes skipped that (Happy Friday 😬 )

private final Map<String, UsageStats> usageStatsByTier;

public static class UsageStats implements Writeable {
private final List<Long> primaryShardSizes;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for removing the indices names array, this is awesome ! 🚀 ❤️

@gmarouli gmarouli merged commit ea2035d into elastic:main Nov 10, 2023
gmarouli added a commit to gmarouli/elasticsearch that referenced this pull request Nov 10, 2023
elasticsearchmachine pushed a commit that referenced this pull request Nov 10, 2023
…ividual nodes (#100230) (#101599)" (#102042)

Reverting because the new action is not properly handled in a mixed
cluster.
davidkyle pushed a commit to davidkyle/elasticsearch that referenced this pull request Nov 13, 2023
davidkyle pushed a commit to davidkyle/elasticsearch that referenced this pull request Nov 13, 2023
…ividual nodes (elastic#100230) (elastic#101599)" (elastic#102042)

Reverting because the new action is not properly handled in a mixed
cluster.
gmarouli added a commit to gmarouli/elasticsearch that referenced this pull request Nov 14, 2023
@gmarouli gmarouli deleted the collect-data-tiers-usage-efficiently-v2 branch August 20, 2024 07:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug buildkite-opt-in Opts your PR into Buildkite instead of Jenkins :Data Management/ILM+SLM Index and Snapshot lifecycle management Team:Data Management Meta label for data/management team v7.17.15 v8.11.1 v8.12.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DataTiersUsageTransportAction is incredibly inefficient in large clusters
4 participants