Description
We currently have the ability for users to split their deployments into tiers based on thing like node attributes, and manually move data between the tiers within ILM. We'd like to take this one step further and formalize the concept of data tiers within Elasticsearch.
Tasks
- Add tiers as dedicated roles (@dakrone) Add data tiers (hot, warm, cold, frozen) as custom node roles #60994
- Add filtering for allowing indices to be assigned to a particular tier (@dakrone) Add data tiers (hot, warm, cold, frozen) as custom node roles #60994
- Add
frozen
phase to ILM (@andreidan) ILM: add frozen phase #60983-
[UI] Add UI for frozen phase[UI] Support "frozen" phase in ILM UI #61345
-
- Inject step to move data between tiers within ILM automatically (@andreidan) ILM migrate data between tiers #61377
- Add check to allow moving past allocation check if allocation settings are manually unset (@andreidan) ILM: allow check-migration step to continue if tier setting unset #62636
-
[UI] Add opt-out UI for automatic data relocation - [UI] Ensure UI is not lossy for actions supported by the API but not yet by the UI
- Automatically allocate new indices to "hot" tier nodes (@dakrone) Allocate newly created indices on data_hot tier nodes #61342
- Add
data_content
tier (@dakrone) Add "content" tier as new "data_content" role #62247 - Choose allocating new indices to hot automatically based on whether the index is a data stream (@dakrone) Allocate new indices on "hot" or "content" tier depending on data stream inclusion #62338
[ ] Add opt-out index level setting to bypass initial hot allocation and ILM phase migration (@andreidan) Add index setting to bypass auto allocation to hot nodes #62114- Make tier allocation decider use a prioritized list of possible values for allocation (@dakrone) Add index.routing.allocation.include._tier_preference setting #62589
- Enhance ILM's injected
migrate
step to correctly set list of possible tiers for each phase (@andreidan) ILM: migrate action configures the _tier_preference setting #62829
- Enhance ILM's injected
- Documentation (@andreidan)
- Overview documentation (@andreidan) DOCS: general overview of data tiers and roles #63086
- ILM documentation (@andreidan) DOCS: general overview of data tiers and roles #63086
- Release notes highlights (@dakrone) Add release note highlights for data tiers #63427
- Telemetry (@dakrone) Add telemetry for data tiers #63031
Context
So why formalize tiers into Elasticsearch (and beyond)? There are a number of advantages to doing this.
- By formalizing this inside of Elasticsearch itself we shift from descriptive best practices to prescriptive best practices. Instead of a million ways to configure hot/warm/cold, we prescribe our preferred solution.
- This allows us to be consistent in our documentation for on-prem as well as on Cloud, we don’t need to make up attributes that may differ, as we can refer to the actual role names and configuration.
- This solution allows us to tell a story not only in our documentation, but also in our out-of-the-box configuration. The idea of data having a lifecycle is concrete instead of abstract based on general purpose constructs.
- A data stream already encapsulates some of the lifecycle of data in that we prevent certain actions to the write index, allowing them only to non-write indices in the stream. This would only be strengthened by having tiers available as a first class feature.
- A better out of the box experience for users using time-series data
- A user now has less to configure in their ILM policy and templates, as data can shift tiers automatically.
- Since we have a distinction between tiers, we have the freedom to be more aggressive with our default ILM policies. For example, we can start to include policies that automatically freeze indices on a frozen tier, or use searchable snapshots by default, because tiers are now a first class idea.
- Autoscaling can be tier-aware. Rather than having to scale based on a node attribute and not knowing whether data is even respecting that attribute by default (since we don’t respect attribute-based allocation by default), autoscaling can differentiate between the different tiers, scaling only a specific part up or down as needed.
Minimum Viable Product
There are a set of things that we’d like to provide for the MVP for formalizing data tiers. This includes functionality for the tiering itself as well as uses within other parts of ES (like ILM). While the features can be expanded at a later time, this is a good starting place for the MVP.
Add tiers to Elasticsearch
The first step will be adding tiers to Elasticsearch itself. We can add the following roles to Elasticsearch:
- data_hot
- data_warm
- data_cold
- data_frozen
These roles are not mutually exclusive. When a user doesn’t specify any of these roles, but does specify the “data” role (or uses the default node role which includes “data”), we will treat the node as if it has all of the data_*
roles.
Not only do we need to make these tiers available for setting, we need to make them accessible for allocation, we currently have a set of built-in attributes that users can specify in our allocation APIs: _name, _host_ip, _publish_ip, _ip, _host, and _id
. I propose that we add another: _tier
. This new attribute could be used manually for both the cluster and index level allocation as well as within ILM. This way we could avoid having to introduce a new set of allocation deciders specifically for moving data within the different tiers, we also already have the infrastructure for include, exclude, and require for a given set of _tier attributes.
An example configuration for this would include the following in elasticsearch.yml:
node.roles: [“master”, “data_hot”, “ingest”]
One of the first uses of the new tiers will be ILM. Currently ILM has a lifecycle that includes the hot, warm, and cold phases and their actions. Making ILM aware of our tiers is a two step process: adding the tier as a new phase, and then making ILM perform the automatic migration.
Adding a “frozen” phase to ILM
Adding a frozen phase also includes adding a set of actions that are allowed as well as the parsing for the phase itself. The “frozen” phase will occur after the “cold” phase but before the “delete” phase. The list of allowed actions for the frozen phase in their execution order will be:
- set_priority
- unfollow
- allocate
- freeze
- searchable_snapshot
Migrating data between tiers automatically
Currently ILM doesn’t migrate any data between tiers automatically, though this is something that has tripped up users in the past (they expect it to move the data, but it doesn’t). The plan is to make ILM automatically move data to the tier corresponding to the ILM phase, unless there is an existing allocate action in the phase with an allocation set (not just a replica change)
This migration should be implemented as an injected step (similar to the way we inject the “unfollow” step in phases) that happens as the first step in a phase, that way the user can monitor it through the existing ILM explain API as well as allowing it to be re-run when a user moves back to a phase. This injected step should fail fast if there are no nodes corresponding to the given phase available in the cluster, and then be retried the next time the ILM policy is executed.
We should add a way to opt-out of this automatic migration, rather than requiring a user to have a custom allocation as the only way to opt out.
Allocate new indices on hot nodes
In addition to making tiers something a user manages, we want new data to automatically be allocated to “hot” nodes by default. This will not affect the out-of-the-box case where each node is of type “data”, because those are considered hot nodes.
This should be implemented as default settings for the index that set:
{
"index.routing.allocation.include._role": "data_hot"
}
As the settings for a brand new index. This has the nice benefit of easily allowing a user to override these default settings in their template, or manually when creating the index. These are the same settings that will be updated by ILM when migrating between phases.