-
Notifications
You must be signed in to change notification settings - Fork 25.4k
DOCS: general overview of data tiers and roles #63086
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Pinging @elastic/es-core-features (:Core/Features/Features) |
Pinging @elastic/es-docs (>docs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for opening this Andrei! I left a bunch of comments and hopefully someone from the docs team can weigh in as well
docs/reference/index-modules/allocation/data_tier_allocation.asciidoc
Outdated
Show resolved
Hide resolved
Common data lifecycle management patterns revolve around transitioning the indices | ||
through multiple collections of nodes with different hardware characteristics in order | ||
to fulfil evolving CRUD, search, and aggregation needs as the indices age. The concept | ||
of a tiered hardware architecture is not new in {es}. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Only my suggestion, not necessarily a requirement)
Common data lifecycle management patterns revolve around transitioning the indices | |
through multiple collections of nodes with different hardware characteristics in order | |
to fulfil evolving CRUD, search, and aggregation needs as the indices age. The concept | |
of a tiered hardware architecture is not new in {es}. | |
Common data lifecycle management patterns revolve around transitioning indices | |
through multiple collections of nodes with different hardware characteristics in order | |
to fulfil evolving CRUD, search, and aggregation needs as indices age. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason I removed the comment about the "not new" section is I think we could/should explicitly add a section about migrating attribute based transitioning to data tier transitioning, perhaps elsewhere or as a blog post?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a great point Lee. I believe the ILM section should advise on how to migrate. That said, I think mentioning/referencing the existing ILM tiered options/methods here is a nice bridge for that (with links going back and forth between the ILM guide and this page).
I'm happy to drop it but I find it a nice bridge towards ILM and the tiered options it enables (with and without data tiers)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've reworded the tiers definition to emphasise things like replicas etc should be configured and don't come as guarantees. Also reworded the data retention a bit to be a guideline.
Let me know if we should reword /remove more.
is retained for months and the indices have zero replicas as they are backed by a searchable | ||
snapshot. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I definitely think this sentence should not be here, as it makes it sound like all of this happens automatically when data is moved to the cold tier
Co-authored-by: Lee Hinman <dakrone@users.noreply.github.com>
@elasticmachine update branch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left several comments & suggestions. Let me know if you have questions or want to discuss.
Updates the <<tier-preference-allocation-filter, `index.routing.allocation.include._tier_preference`>> | ||
index setting in order to migrate the index to the <<modules-tiers, data tier>> corresponding | ||
to the current phase. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updates the <<tier-preference-allocation-filter, `index.routing.allocation.include._tier_preference`>> | |
index setting in order to migrate the index to the <<modules-tiers, data tier>> corresponding | |
to the current phase. | |
Moves the index to the <<modules-tiers, data tier>> that corresponds | |
to the current phase by updating the <<tier-preference-allocation-filter, `index.routing.allocation.include._tier_preference`>> | |
index setting. | |
{ilm-init} automatically injects the migrate action in the warm and cold | |
phases if no allocation options are specified with the <<ilm-allocate, allocate>> action. If you specify an allocate action that only modifies the number of index | |
replicas, {ilm-init} reduces the number of replicas before migrating the index. | |
To prevent automatic migration without specifying allocation options, | |
you can explicitly include the migrate action and set the enabled option to`false`. |
Content data nodes accommodate user-created content. They enable operations like CRUD, | ||
search and aggregations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need a better definition of content node. Defining it in terms of "user-created content" could be interpreted as actual user-generated content, not content like a product catalog. I was trying to define it in terms of "collections of things" vs a stream of data. Maybe something like "Content data nodes store indices that contain collections of things such as an catalog of products. The value of the data in a content node remains relatively constant, and the performance requirements aren't tied to the age of the data."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think introducing more abstract terms could potentially complicate things further here. I believe the product catalog would usually be manually introduced in the system (ie. user created) as opposed to being machine generated (like logs and metrics).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if it would be clearer if we talk about "content" by exemplifying it as opposed to using the content origin?
eg. Content data nodes store the documents that back/support application, website, and enterprise search. The value of the data in a content node remains relatively constant, and the performance requirements aren't tied to the age of the data.
docs/reference/modules/node.asciidoc
Outdated
Warm data nodes hold indices after they are no longer being written to, but still being | ||
queried, usually at a lower frequency than it was in the hot tier. Lower performant | ||
hardware can usually be used in this tier. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Warm data nodes hold indices after they are no longer being written to, but still being | |
queried, usually at a lower frequency than it was in the hot tier. Lower performant | |
hardware can usually be used in this tier. | |
Warm data nodes store indices that are no longer being regularly updated, but are still being | |
queried. Query volume is usually at a lower than it was while the index was in the hot tier. Less performant | |
hardware can usually be used for nodes in this tier. |
docs/reference/setup.asciidoc
Outdated
@@ -79,6 +79,8 @@ include::settings/monitoring-settings.asciidoc[] | |||
|
|||
include::modules/node.asciidoc[] | |||
|
|||
include::modules/datatiers.asciidoc[] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per previous comment, I think we want this info at the top level.
Co-authored-by: debadair <debadair@elastic.co>
Co-authored-by: debadair <debadair@elastic.co>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks much better, thanks for working on this! I left a bunch of comments still, but they are really minor. Deb should take another look before merging also.
Co-authored-by: Lee Hinman <dakrone@users.noreply.github.com>
Thanks for the review @dakrone |
This adds general overview documentation for data tiers, the data tiers specific node roles, and their application in ILM. Co-authored-by: Lee Hinman <dakrone@users.noreply.github.com> Co-authored-by: debadair <debadair@elastic.co> (cherry picked from commit d588cab) Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
This adds general overview documentation for data tiers, the data tiers specific node roles, and their application in ILM. Co-authored-by: Lee Hinman <dakrone@users.noreply.github.com> Co-authored-by: debadair <debadair@elastic.co> (cherry picked from commit d588cab) Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
This adds general overview documentation for data tiers, the data tiers specific node roles, and their application in ILM. Co-authored-by: Lee Hinman <dakrone@users.noreply.github.com> Co-authored-by: debadair <debadair@elastic.co> (cherry picked from commit d588cab) Signed-off-by: Andrei Dan <andrei.dan@elastic.co> Co-authored-by: Lee Hinman <dakrone@users.noreply.github.com> Co-authored-by: debadair <debadair@elastic.co>
This adds general overview documentation for data tiers, the data tiers specific node roles, and their application in ILM. Co-authored-by: Lee Hinman <dakrone@users.noreply.github.com> Co-authored-by: debadair <debadair@elastic.co> (cherry picked from commit d588cab) Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
This adds a general overview documentation for data tiers
and the data tiers specific node roles.
Relates to #60848