Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add 1.21 graceful node shutdown blog post #27335

Merged
merged 3 commits into from
Apr 13, 2021

Conversation

salaxander
Copy link
Contributor

This is a 1.21 feature blog for graceful node shutdown (https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/2000-graceful-node-shutdown).

Target publish date is 04/21/21.

cc @bobbypage @divya-mohan0209 @sftim @mrunalp

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. area/blog Issues or PRs related to the Kubernetes Blog subproject labels Mar 30, 2021
@k8s-ci-robot k8s-ci-robot added language/en Issues or PRs related to English language sig/docs Categorizes an issue or PR as relevant to SIG Docs. labels Mar 30, 2021
@netlify
Copy link

netlify bot commented Mar 30, 2021

Deploy preview for kubernetes-io-master-staging ready!

Built with commit 9829da1

https://deploy-preview-27335--kubernetes-io-master-staging.netlify.app

Copy link
Member

@onlydole onlydole left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - thank you, @salaxander

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 30, 2021
Copy link
Contributor

@sftim sftim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Warning: plentiful feedback.

The text changes I'm proposing are suggestions, not instructions. Please feel free to take the changes that you think improve the article.

We do strongly prefer site relative hyperlinks: /docs/tasks/administer-cluster/guaranteed-scheduling-critical-addon-pods/#marking-pod-as-critical not https://kubernetes.io/docs/tasks/administer-cluster/guaranteed-scheduling-critical-addon-pods/#marking-pod-as-critical (and one day we might get everything lint-able to enforce that).

Site-relative links help in a couple of ways including if anyone is borrowing these docs for a different project - we're open source, after all. For the blog it's not essential but I'd prefer to follow the convention.

Thanks @salaxander and I hope you find this review helpful. If you think I've misunderstood anything, have any concerns, or just have questions, let me know!


**Authors:** David Porter (Google), Murnal Patel (Red Hat), and Tim Bannister (The Scale Factory)

Kuberentes is a distributed system and as such we need to be prepared for inevitable failures — nodes will fail, containers might crash or be restarted, and - ideally - your workloads will be able to withstand these catastrophic events.
Copy link
Contributor

@sftim sftim Mar 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Kuberentes is a distributed system and as such we need to be prepared for inevitable failures — nodes will fail, containers might crash or be restarted, and - ideally - your workloads will be able to withstand these catastrophic events.
Kubernetes is a distributed system and as such we need to be prepared for inevitable failure. Nodes can fail, containers might crash or be restarted, and - ideally - your workloads will be able to withstand these catastrophic events.

or, just fixing the typo:

Suggested change
Kuberentes is a distributed system and as such we need to be prepared for inevitable failures — nodes will fail, containers might crash or be restarted, and - ideally - your workloads will be able to withstand these catastrophic events.
Kubernetes is a distributed system and as such we need to be prepared for inevitable failures — nodes will fail, containers might crash or be restarted, and - ideally - your workloads will be able to withstand these catastrophic events.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


Kuberentes is a distributed system and as such we need to be prepared for inevitable failures — nodes will fail, containers might crash or be restarted, and - ideally - your workloads will be able to withstand these catastrophic events.

One of the common classes of issues are workload failures on node shutdown or restart. The best practice prior to bringing your node down is to [safely drain and cordon your node](https://kubernetes.io/docs/tasks/administer-cluster/safely-drain-node/). This will ensure that all pods running on this node can safely be evicted. An eviction will ensure your pods can follow the expected pod [termination lifecycle](https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-termination) meaning receiving a SIGTERM in your container and/or running preStopHooks, etc.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
One of the common classes of issues are workload failures on node shutdown or restart. The best practice prior to bringing your node down is to [safely drain and cordon your node](https://kubernetes.io/docs/tasks/administer-cluster/safely-drain-node/). This will ensure that all pods running on this node can safely be evicted. An eviction will ensure your pods can follow the expected pod [termination lifecycle](https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-termination) meaning receiving a SIGTERM in your container and/or running preStopHooks, etc.
One of the common classes of issues are workload failures on node shutdown or restart. The best practice prior to bringing your node down is to [safely drain and cordon your node](/docs/tasks/administer-cluster/safely-drain-node/). This will ensure that all pods running on this node can safely be evicted. An eviction will ensure your pods can follow the expected pod [termination lifecycle](/docs/concepts/workloads/pods/pod-lifecycle/#pod-termination) meaning receiving a SIGTERM in your container and/or running `preStopHooks`, etc.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines 14 to 16
Unfortunately prior to kubernetes 1.20, safe node draining is not always possible: it requires users to manually take action and drain the node beforehand. If someone or something shuts down your node without a drain beforehand, most likely your pods will not be safely evicted from your node and shutdown abruptly.

In Kubernetes 1.20 graceful node shutdown was introduced as a new feature in alpha, and later in 1.21 brought to beta. Graceful node shutdown gives you more control over some of those unexpected shutdown situations. With Graceful node shutdown, the kubelet is aware of underlying system shutdown events and can propagate these events to pods, ensuring their containers can shut down as gracefully as possible. This gives the containers a chance to checkpoint their state or release back any resources they are holding.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Unfortunately prior to kubernetes 1.20, safe node draining is not always possible: it requires users to manually take action and drain the node beforehand. If someone or something shuts down your node without a drain beforehand, most likely your pods will not be safely evicted from your node and shutdown abruptly.
In Kubernetes 1.20 graceful node shutdown was introduced as a new feature in alpha, and later in 1.21 brought to beta. Graceful node shutdown gives you more control over some of those unexpected shutdown situations. With Graceful node shutdown, the kubelet is aware of underlying system shutdown events and can propagate these events to pods, ensuring their containers can shut down as gracefully as possible. This gives the containers a chance to checkpoint their state or release back any resources they are holding.
Prior to Kubernetes 1.20 (when_graceful node shutdown_ was introduced as an alpha feature), safe node draining was not easy: it required users to manually take action and drain the node beforehand. If someone or something shut down your node without draining it first, most likely your pods would not be safely evicted from your node. Other services talking to those pods might see errors when they exited abruptly.
Kubernetes v1.21 moves safe node draining to beta, which means it's enabled by default. Graceful node shutdown gives you more control over some of those unexpected shutdown situations. With graceful node shutdown, the kubelet is aware of underlying system shutdown events and can propagate these events to pods, ensuring their containers can shut down as gracefully as possible. This gives the containers a chance to checkpoint their state or release back any resources they are holding.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


## How does it work?
On Linux, your system can shut down in many different situations. For example:
* A user or script running `shutdown -h -P now` or `systemctl poweroff`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case the reader thinks this is item only about power-off actions specifically:

Suggested change
* A user or script running `shutdown -h -P now` or `systemctl poweroff`
* A user or script running commands such as `shutdown -h -P now` or `systemctl reboot`

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

* A user or script running `shutdown -h -P now` or `systemctl poweroff`
* Physically pressing a power button on the machine.
* Stopping a VM instance on a cloud provider, e.g. `gcloud compute instances stop` on GCP.
* A Preemptible VM or Spot Instances that can be terminated by a cloud provider unexpectedly.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* A Preemptible VM or Spot Instances that can be terminated by a cloud provider unexpectedly.
* A Preemptible VM or Spot Instance that your cloud provider can terminate unexpectedly, but with a brief warning.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

One important consideration we took when designing this feature is that not all pods are created equal. For example, some of the pods running on a node such as a logging related daemonset should stay running as long as possible to capture important logs during the shutdown itself. As a result, pods are split into two categories: “regular” and “critical”. [Critical pods](https://kubernetes.io/docs/tasks/administer-cluster/guaranteed-scheduling-critical-addon-pods/#marking-pod-as-critical) are those that have `priorityClassName` set to `system-cluster-critical` or `system-node-critical`; all other pods are considered regular. In our example, the logging DaemonSet would run as a critical pod. During the graceful shutdown, regular pods are terminated first, followed by critical pods. As an example, this would allow a critical pod like those part of a logging daemonset to continue functioning, and collecting logs during the termination of regular pods.

## How do I use it?
Graceful node shutdown is controlled with the `GracefulNodeShutdown` [feature gate](https://github.com/kubernetes/website/blob/master/docs/reference/command-line-tools-reference/feature-gates) which is enabled by default in 1.21.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Graceful node shutdown is controlled with the `GracefulNodeShutdown` [feature gate](https://github.com/kubernetes/website/blob/master/docs/reference/command-line-tools-reference/feature-gates) which is enabled by default in 1.21.
Graceful node shutdown is controlled with the `GracefulNodeShutdown` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/) and is enabled by default in Kubernetes 1.21.

## How do I use it?
Graceful node shutdown is controlled with the `GracefulNodeShutdown` [feature gate](https://github.com/kubernetes/website/blob/master/docs/reference/command-line-tools-reference/feature-gates) which is enabled by default in 1.21.

Graceful node shutdown feature is configured with two kubelet configuration options: `ShutdownGracePeriod`, `ShutdownGracePeriodCriticalPods`. These can be adjusted by editing the kubelet config file that is passed to kubelet via the `--config` flag; for more details, refer to [set kubelet parameters via a configuration file](https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/).
Copy link
Contributor

@sftim sftim Mar 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Graceful node shutdown feature is configured with two kubelet configuration options: `ShutdownGracePeriod`, `ShutdownGracePeriodCriticalPods`. These can be adjusted by editing the kubelet config file that is passed to kubelet via the `--config` flag; for more details, refer to [set kubelet parameters via a configuration file](https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/).
You can configure the graceful shutdown behavior using two kubelet configuration options: `ShutdownGracePeriod` and `ShutdownGracePeriodCriticalPods`. To change these, you edit the kubelet configuration file that is passed to kubelet via the `--config` flag; for more details, refer to [Set kubelet parameters via a configuration file](/docs/tasks/administer-cluster/kubelet-config-file/).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

* `ShutdownGracePeriod`
* Specifies the total duration that the node should delay the shutdown by. This is the total grace period for pod termination for both regular and critical pods.
* `ShutdownGracePeriodCriticalPods`
* Specifies the duration used to terminate critical pods during a node shutdown. This should be less than ShutdownGracePeriod.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Specifies the duration used to terminate critical pods during a node shutdown. This should be less than ShutdownGracePeriod.
* Specifies the duration used to terminate critical pods during a node shutdown. This should be less than `ShutdownGracePeriod`.

For example, if `ShutdownGracePeriod=30s`, and `ShutdownGracePeriodCriticalPods=10s`, kubelet will delay the node shutdown by 30 seconds. During the shutdown, the first 20 (30-10) seconds would be reserved for gracefully terminating normal pods, and the last 10 seconds would be reserved for terminating critical pods.

## How can I learn more?
* Documentation: [https://kubernetes.io/docs/concepts/architecture/nodes/#graceful-node-shutdown](https://kubernetes.io/docs/concepts/architecture/nodes/#graceful-node-shutdown)
Copy link
Contributor

@sftim sftim Mar 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Documentation: [https://kubernetes.io/docs/concepts/architecture/nodes/#graceful-node-shutdown](https://kubernetes.io/docs/concepts/architecture/nodes/#graceful-node-shutdown)
* Read the [documentation](/docs/concepts/architecture/nodes/#graceful-node-shutdown) for graceful node shutdown
* Read the enhancement proposal, [KEP 2000](https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/2000-graceful-node-shutdown)
* View the [code](https://github.com/kubernetes/kubernetes/tree/release-1.21/pkg/kubelet/nodeshutdown) on GitHub

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done (agree, the active action reads better here) :)

* View the code: [https://github.com/kubernetes/kubernetes/tree/release-1.20/pkg/kubelet/nodeshutdown](https://github.com/kubernetes/kubernetes/tree/release-1.20/pkg/kubelet/nodeshutdown)

## How do I get involved?
Your feedback is always welcome! SIG-Node meets regularly and can be reached via Slack and the mailing list.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Your feedback is always welcome! SIG-Node meets regularly and can be reached via Slack and the mailing list.
Your feedback is always welcome! SIG Node meets regularly and can be reached via [Slack](https://slack.k8s.io) (channel `#sig-node`), or the SIG's [mailing list](https://github.com/kubernetes/community/tree/master/sig-node#contact).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@sftim
Copy link
Contributor

sftim commented Mar 30, 2021

Because the first word of the article is a typo of “Kubernetes”:
/lgtm cancel

but: reviewers, feel free to re-add LGTM at will as needed.

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 30, 2021
@bobbypage
Copy link
Member

Thanks for all the feedback @sftim. I updated based on your feedback and some other comments I received on the original Google doc. Please take a look again.


**Authors:** David Porter (Google), Murnal Patel (Red Hat), and Tim Bannister (The Scale Factory)

Graceful node shutdown, beta in 1.21, enables kubelet to gracefully evict pods during a machine shutdown.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: terminology: should it be called node?

Suggested change
Graceful node shutdown, beta in 1.21, enables kubelet to gracefully evict pods during a machine shutdown.
Graceful node shutdown, beta in 1.21, enables kubelet to gracefully evict pods during a node shutdown.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


In our example, the logging DaemonSet would run as a critical pod. During the graceful node shutdown, regular pods are terminated first, followed by critical pods. As an example, this would allow a critical pod associated with a logging daemonset to continue functioning, and collecting logs during the termination of regular pods.

We will evaluate during the beta phase if we need more flexibility for different pod priority classes and add support if needed.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
We will evaluate during the beta phase if we need more flexibility for different pod priority classes and add support if needed.
We will evaluate during the beta phase if we need more flexibility for different pod priority classes and add support if needed. Please, tell us about your scenario that may require finer grain configuration.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reworded a bit, but added this note.

Prior to Kubernetes 1.20 (when graceful node shutdown was introduced as an alpha feature), safe node draining was not easy: it required users to manually take action and drain the node beforehand. If someone or something shut down your node without draining it first, most likely your pods would not be safely evicted from your node and shutdown abruptly. Other services talking to those pods might see errors due to the pods exiting abruptly. Some examples of this situation may be caused by a reboot due to security patches or preemption of short lived cloud compute instances.

Kubernetes 1.21 brings graceful node shutdown to beta. Graceful node shutdown gives you more control over some of those unexpected shutdown situations. With graceful node shutdown, the kubelet is aware of underlying system shutdown events and can propagate these events to pods, ensuring containers can shut down as gracefully as possible. This gives the containers a chance to checkpoint their state or release back any resources they are holding.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add suggestion from the word doc:

Suggested change
Note, that for the best availability, you still need to design your workload to be resilient for node failures. Graceful node shutdown feature makes it easier to handle semi-planned node termination events, but will not help in case of node failure.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reworded a bit and added.

@mrbobbytables
Copy link
Member

/approve
/hold
for further review

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 12, 2021
@mrbobbytables mrbobbytables added the tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. label Apr 12, 2021
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mrbobbytables

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 12, 2021
@mrunalp
Copy link
Contributor

mrunalp commented Apr 13, 2021

LGTM

@sftim
Copy link
Contributor

sftim commented Apr 13, 2021

/hold cancel
/lgtm

Scheduled for 2021-04-21

@k8s-ci-robot k8s-ci-robot added lgtm "Looks good to me", indicates that a PR is ready to be merged. and removed do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels Apr 13, 2021
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 37bd7f52f5f7c35306007094052d27300f5ae7fd

@k8s-ci-robot k8s-ci-robot merged commit 7c77d75 into kubernetes:master Apr 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/blog Issues or PRs related to the Kubernetes Blog subproject cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. language/en Issues or PRs related to English language lgtm "Looks good to me", indicates that a PR is ready to be merged. sig/docs Categorizes an issue or PR as relevant to SIG Docs. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants