Skip to content

DOCS: general overview of data tiers and roles #63086

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 21 commits into from
Oct 7, 2020

Conversation

andreidan
Copy link
Contributor

@andreidan andreidan commented Sep 30, 2020

This adds a general overview documentation for data tiers
and the data tiers specific node roles.

Relates to #60848

@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-features (:Core/Features/Features)

@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-docs (>docs)

@elasticmachine elasticmachine added Team:Data Management Meta label for data/management team Team:Docs Meta label for docs team labels Sep 30, 2020
@andreidan andreidan removed the WIP label Sep 30, 2020
Copy link
Member

@dakrone dakrone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for opening this Andrei! I left a bunch of comments and hopefully someone from the docs team can weigh in as well

Comment on lines 5 to 8
Common data lifecycle management patterns revolve around transitioning the indices
through multiple collections of nodes with different hardware characteristics in order
to fulfil evolving CRUD, search, and aggregation needs as the indices age. The concept
of a tiered hardware architecture is not new in {es}.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Only my suggestion, not necessarily a requirement)

Suggested change
Common data lifecycle management patterns revolve around transitioning the indices
through multiple collections of nodes with different hardware characteristics in order
to fulfil evolving CRUD, search, and aggregation needs as the indices age. The concept
of a tiered hardware architecture is not new in {es}.
Common data lifecycle management patterns revolve around transitioning indices
through multiple collections of nodes with different hardware characteristics in order
to fulfil evolving CRUD, search, and aggregation needs as indices age.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I removed the comment about the "not new" section is I think we could/should explicitly add a section about migrating attribute based transitioning to data tier transitioning, perhaps elsewhere or as a blog post?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a great point Lee. I believe the ILM section should advise on how to migrate. That said, I think mentioning/referencing the existing ILM tiered options/methods here is a nice bridge for that (with links going back and forth between the ILM guide and this page).

I'm happy to drop it but I find it a nice bridge towards ILM and the tiered options it enables (with and without data tiers)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've reworded the tiers definition to emphasise things like replicas etc should be configured and don't come as guarantees. Also reworded the data retention a bit to be a guideline.

Let me know if we should reword /remove more.

Comment on lines 79 to 80
is retained for months and the indices have zero replicas as they are backed by a searchable
snapshot.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I definitely think this sentence should not be here, as it makes it sound like all of this happens automatically when data is moved to the cold tier

@andreidan
Copy link
Contributor Author

@elasticmachine update branch

@andreidan andreidan requested a review from dakrone October 2, 2020 13:36
Copy link
Contributor

@debadair debadair left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left several comments & suggestions. Let me know if you have questions or want to discuss.

Comment on lines 7 to 9
Updates the <<tier-preference-allocation-filter, `index.routing.allocation.include._tier_preference`>>
index setting in order to migrate the index to the <<modules-tiers, data tier>> corresponding
to the current phase.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Updates the <<tier-preference-allocation-filter, `index.routing.allocation.include._tier_preference`>>
index setting in order to migrate the index to the <<modules-tiers, data tier>> corresponding
to the current phase.
Moves the index to the <<modules-tiers, data tier>> that corresponds
to the current phase by updating the <<tier-preference-allocation-filter, `index.routing.allocation.include._tier_preference`>>
index setting.
{ilm-init} automatically injects the migrate action in the warm and cold
phases if no allocation options are specified with the <<ilm-allocate, allocate>> action. If you specify an allocate action that only modifies the number of index
replicas, {ilm-init} reduces the number of replicas before migrating the index.
To prevent automatic migration without specifying allocation options,
you can explicitly include the migrate action and set the enabled option to`false`.

Comment on lines +228 to +229
Content data nodes accommodate user-created content. They enable operations like CRUD,
search and aggregations.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need a better definition of content node. Defining it in terms of "user-created content" could be interpreted as actual user-generated content, not content like a product catalog. I was trying to define it in terms of "collections of things" vs a stream of data. Maybe something like "Content data nodes store indices that contain collections of things such as an catalog of products. The value of the data in a content node remains relatively constant, and the performance requirements aren't tied to the age of the data."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think introducing more abstract terms could potentially complicate things further here. I believe the product catalog would usually be manually introduced in the system (ie. user created) as opposed to being machine generated (like logs and metrics).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it would be clearer if we talk about "content" by exemplifying it as opposed to using the content origin?

eg. Content data nodes store the documents that back/support application, website, and enterprise search. The value of the data in a content node remains relatively constant, and the performance requirements aren't tied to the age of the data.

Comment on lines 252 to 254
Warm data nodes hold indices after they are no longer being written to, but still being
queried, usually at a lower frequency than it was in the hot tier. Lower performant
hardware can usually be used in this tier.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Warm data nodes hold indices after they are no longer being written to, but still being
queried, usually at a lower frequency than it was in the hot tier. Lower performant
hardware can usually be used in this tier.
Warm data nodes store indices that are no longer being regularly updated, but are still being
queried. Query volume is usually at a lower than it was while the index was in the hot tier. Less performant
hardware can usually be used for nodes in this tier.

@@ -79,6 +79,8 @@ include::settings/monitoring-settings.asciidoc[]

include::modules/node.asciidoc[]

include::modules/datatiers.asciidoc[]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per previous comment, I think we want this info at the top level.

@andreidan andreidan requested a review from debadair October 5, 2020 12:48
Copy link
Member

@dakrone dakrone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks much better, thanks for working on this! I left a bunch of comments still, but they are really minor. Deb should take another look before merging also.

Co-authored-by: Lee Hinman <dakrone@users.noreply.github.com>
@andreidan
Copy link
Contributor Author

Thanks for the review @dakrone

@andreidan andreidan merged commit d588cab into elastic:master Oct 7, 2020
andreidan added a commit to andreidan/elasticsearch that referenced this pull request Oct 7, 2020
This adds general overview documentation for data tiers,
the data tiers specific node roles, and their application in
ILM.

Co-authored-by: Lee Hinman <dakrone@users.noreply.github.com>
Co-authored-by: debadair <debadair@elastic.co>
(cherry picked from commit d588cab)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
andreidan added a commit to andreidan/elasticsearch that referenced this pull request Oct 7, 2020
This adds general overview documentation for data tiers,
the data tiers specific node roles, and their application in
ILM.

Co-authored-by: Lee Hinman <dakrone@users.noreply.github.com>
Co-authored-by: debadair <debadair@elastic.co>
(cherry picked from commit d588cab)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
andreidan added a commit that referenced this pull request Oct 7, 2020
This adds general overview documentation for data tiers,
the data tiers specific node roles, and their application in
ILM.

Co-authored-by: Lee Hinman <dakrone@users.noreply.github.com>
Co-authored-by: debadair <debadair@elastic.co>
(cherry picked from commit d588cab)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>

Co-authored-by: Lee Hinman <dakrone@users.noreply.github.com>
Co-authored-by: debadair <debadair@elastic.co>
andreidan added a commit that referenced this pull request Oct 7, 2020
This adds general overview documentation for data tiers,
the data tiers specific node roles, and their application in
ILM.

Co-authored-by: Lee Hinman <dakrone@users.noreply.github.com>
Co-authored-by: debadair <debadair@elastic.co>
(cherry picked from commit d588cab)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport pending >docs General docs changes Team:Data Management Meta label for data/management team Team:Docs Meta label for docs team v7.10.0 v7.11.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants