add 1.21 graceful node shutdown blog post #27335

salaxander · 2021-03-30T20:11:09Z

This is a 1.21 feature blog for graceful node shutdown (https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/2000-graceful-node-shutdown).

Target publish date is 04/21/21.

cc @bobbypage @divya-mohan0209 @sftim @mrunalp

netlify · 2021-03-30T20:25:55Z

Deploy preview for kubernetes-io-master-staging ready!

Built with commit 9829da1

https://deploy-preview-27335--kubernetes-io-master-staging.netlify.app

onlydole

LGTM - thank you, @salaxander

/lgtm

sftim

Warning: plentiful feedback.

The text changes I'm proposing are suggestions, not instructions. Please feel free to take the changes that you think improve the article.

We do strongly prefer site relative hyperlinks: /docs/tasks/administer-cluster/guaranteed-scheduling-critical-addon-pods/#marking-pod-as-critical not https://kubernetes.io/docs/tasks/administer-cluster/guaranteed-scheduling-critical-addon-pods/#marking-pod-as-critical (and one day we might get everything lint-able to enforce that).

Site-relative links help in a couple of ways including if anyone is borrowing these docs for a different project - we're open source, after all. For the blog it's not essential but I'd prefer to follow the convention.

Thanks @salaxander and I hope you find this review helpful. If you think I've misunderstood anything, have any concerns, or just have questions, let me know!

sftim · 2021-03-30T22:15:17Z

content/en/blog/_posts/2021-04-21-Graceful-Node-Shutdown-Beta.md

+
+**Authors:** David Porter (Google), Murnal Patel (Red Hat), and Tim Bannister (The Scale Factory)
+
+Kuberentes is a distributed system and as such we need to be prepared for inevitable failures — nodes will fail, containers might crash or be restarted, and - ideally - your workloads will be able to withstand these catastrophic events. 


Suggested change

Kuberentes is a distributed system and as such we need to be prepared for inevitable failures — nodes will fail, containers might crash or be restarted, and - ideally - your workloads will be able to withstand these catastrophic events.

Kubernetes is a distributed system and as such we need to be prepared for inevitable failure. Nodes can fail, containers might crash or be restarted, and - ideally - your workloads will be able to withstand these catastrophic events.

or, just fixing the typo:

Suggested change

Kuberentes is a distributed system and as such we need to be prepared for inevitable failures — nodes will fail, containers might crash or be restarted, and - ideally - your workloads will be able to withstand these catastrophic events.

Kubernetes is a distributed system and as such we need to be prepared for inevitable failures — nodes will fail, containers might crash or be restarted, and - ideally - your workloads will be able to withstand these catastrophic events.

sftim · 2021-03-30T22:15:50Z

content/en/blog/_posts/2021-04-21-Graceful-Node-Shutdown-Beta.md

+
+Kuberentes is a distributed system and as such we need to be prepared for inevitable failures — nodes will fail, containers might crash or be restarted, and - ideally - your workloads will be able to withstand these catastrophic events. 
+
+One of the common classes of issues are workload failures on node shutdown or restart. The best practice prior to bringing your node down is to [safely drain and cordon your node](https://kubernetes.io/docs/tasks/administer-cluster/safely-drain-node/). This will ensure that all pods running on this node can safely be evicted. An eviction will ensure your pods can follow the expected pod [termination lifecycle](https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-termination) meaning receiving a SIGTERM in your container and/or running preStopHooks, etc.


Suggested change

One of the common classes of issues are workload failures on node shutdown or restart. The best practice prior to bringing your node down is to [safely drain and cordon your node](https://kubernetes.io/docs/tasks/administer-cluster/safely-drain-node/). This will ensure that all pods running on this node can safely be evicted. An eviction will ensure your pods can follow the expected pod [termination lifecycle](https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-termination) meaning receiving a SIGTERM in your container and/or running preStopHooks, etc.

One of the common classes of issues are workload failures on node shutdown or restart. The best practice prior to bringing your node down is to [safely drain and cordon your node](/docs/tasks/administer-cluster/safely-drain-node/). This will ensure that all pods running on this node can safely be evicted. An eviction will ensure your pods can follow the expected pod [termination lifecycle](/docs/concepts/workloads/pods/pod-lifecycle/#pod-termination) meaning receiving a SIGTERM in your container and/or running `preStopHooks`, etc.

sftim · 2021-03-30T22:20:01Z

content/en/blog/_posts/2021-04-21-Graceful-Node-Shutdown-Beta.md

+Unfortunately prior to kubernetes 1.20, safe node draining is not always possible: it requires users to manually take action and drain the node beforehand. If someone or something shuts down your node without a drain beforehand, most likely your pods will not be safely evicted from your node and shutdown abruptly. 
+
+In Kubernetes 1.20 graceful node shutdown was introduced as a new feature in alpha, and later in 1.21 brought to beta. Graceful node shutdown gives you more control over some of those unexpected shutdown situations. With Graceful node shutdown, the kubelet is aware of underlying system shutdown events and can propagate these events to pods, ensuring their containers can shut down as gracefully as possible. This gives the containers a chance to checkpoint their state or release back any resources they are holding.


Suggested change

Unfortunately prior to kubernetes 1.20, safe node draining is not always possible: it requires users to manually take action and drain the node beforehand. If someone or something shuts down your node without a drain beforehand, most likely your pods will not be safely evicted from your node and shutdown abruptly.

In Kubernetes 1.20 graceful node shutdown was introduced as a new feature in alpha, and later in 1.21 brought to beta. Graceful node shutdown gives you more control over some of those unexpected shutdown situations. With Graceful node shutdown, the kubelet is aware of underlying system shutdown events and can propagate these events to pods, ensuring their containers can shut down as gracefully as possible. This gives the containers a chance to checkpoint their state or release back any resources they are holding.

Prior to Kubernetes 1.20 (when_graceful node shutdown_ was introduced as an alpha feature), safe node draining was not easy: it required users to manually take action and drain the node beforehand. If someone or something shut down your node without draining it first, most likely your pods would not be safely evicted from your node. Other services talking to those pods might see errors when they exited abruptly.

Kubernetes v1.21 moves safe node draining to beta, which means it's enabled by default. Graceful node shutdown gives you more control over some of those unexpected shutdown situations. With graceful node shutdown, the kubelet is aware of underlying system shutdown events and can propagate these events to pods, ensuring their containers can shut down as gracefully as possible. This gives the containers a chance to checkpoint their state or release back any resources they are holding.

sftim · 2021-03-30T22:21:17Z

content/en/blog/_posts/2021-04-21-Graceful-Node-Shutdown-Beta.md

+
+## How does it work?
+On Linux, your system can shut down in many different situations. For example:
+* A user or script running `shutdown -h -P now` or `systemctl poweroff`


In case the reader thinks this is item only about power-off actions specifically:

Suggested change

* A user or script running `shutdown -h -P now` or `systemctl poweroff`

* A user or script running commands such as `shutdown -h -P now` or `systemctl reboot`

sftim · 2021-03-30T22:22:17Z

content/en/blog/_posts/2021-04-21-Graceful-Node-Shutdown-Beta.md

+* A user or script running `shutdown -h -P now` or `systemctl poweroff`
+* Physically pressing a power button on the machine.
+* Stopping a VM instance on a cloud provider, e.g. `gcloud compute instances stop` on GCP.
+* A Preemptible VM or Spot Instances that can be terminated by a cloud provider unexpectedly.


Suggested change

* A Preemptible VM or Spot Instances that can be terminated by a cloud provider unexpectedly.

* A Preemptible VM or Spot Instance that your cloud provider can terminate unexpectedly, but with a brief warning.

sftim · 2021-03-30T22:24:53Z

content/en/blog/_posts/2021-04-21-Graceful-Node-Shutdown-Beta.md

+One important consideration we took when designing this feature is that not all pods are created equal. For example, some of the pods running on a node such as a logging related daemonset should stay running as long as possible to capture important logs during the shutdown itself. As a result, pods are split into two categories: “regular” and “critical”. [Critical pods](https://kubernetes.io/docs/tasks/administer-cluster/guaranteed-scheduling-critical-addon-pods/#marking-pod-as-critical) are those that have `priorityClassName` set to `system-cluster-critical` or `system-node-critical`; all other pods are considered regular. In our example, the logging DaemonSet would run as a critical pod. During the graceful shutdown, regular pods are terminated first, followed by critical pods. As an example, this would allow a critical pod like those part of a logging daemonset to continue functioning, and collecting logs during the termination of regular pods.
+
+## How do I use it?
+Graceful node shutdown is controlled with the `GracefulNodeShutdown` [feature gate](https://github.com/kubernetes/website/blob/master/docs/reference/command-line-tools-reference/feature-gates) which is enabled by default in 1.21.


Suggested change

Graceful node shutdown is controlled with the `GracefulNodeShutdown` [feature gate](https://github.com/kubernetes/website/blob/master/docs/reference/command-line-tools-reference/feature-gates) which is enabled by default in 1.21.

Graceful node shutdown is controlled with the `GracefulNodeShutdown` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/) and is enabled by default in Kubernetes 1.21.

sftim · 2021-03-30T22:26:14Z

content/en/blog/_posts/2021-04-21-Graceful-Node-Shutdown-Beta.md

+## How do I use it?
+Graceful node shutdown is controlled with the `GracefulNodeShutdown` [feature gate](https://github.com/kubernetes/website/blob/master/docs/reference/command-line-tools-reference/feature-gates) which is enabled by default in 1.21.
+
+Graceful node shutdown feature is configured with two kubelet configuration options: `ShutdownGracePeriod`, `ShutdownGracePeriodCriticalPods`. These can be adjusted by editing the kubelet config file that is passed to kubelet via the `--config` flag; for more details, refer to [set kubelet parameters via a configuration file](https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/).


Suggested change

Graceful node shutdown feature is configured with two kubelet configuration options: `ShutdownGracePeriod`, `ShutdownGracePeriodCriticalPods`. These can be adjusted by editing the kubelet config file that is passed to kubelet via the `--config` flag; for more details, refer to [set kubelet parameters via a configuration file](https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/).

You can configure the graceful shutdown behavior using two kubelet configuration options: `ShutdownGracePeriod` and `ShutdownGracePeriodCriticalPods`. To change these, you edit the kubelet configuration file that is passed to kubelet via the `--config` flag; for more details, refer to [Set kubelet parameters via a configuration file](/docs/tasks/administer-cluster/kubelet-config-file/).

sftim · 2021-03-30T22:26:59Z

content/en/blog/_posts/2021-04-21-Graceful-Node-Shutdown-Beta.md

+* `ShutdownGracePeriod`
+    * Specifies the total duration that the node should delay the shutdown by. This is the total grace period for pod termination for both regular and critical pods.
+* `ShutdownGracePeriodCriticalPods`
+    * Specifies the duration used to terminate critical pods during a node shutdown. This should be less than   ShutdownGracePeriod.


Suggested change

* Specifies the duration used to terminate critical pods during a node shutdown. This should be less than ShutdownGracePeriod.

* Specifies the duration used to terminate critical pods during a node shutdown. This should be less than `ShutdownGracePeriod`.

sftim · 2021-03-30T22:29:32Z

content/en/blog/_posts/2021-04-21-Graceful-Node-Shutdown-Beta.md

+For example, if `ShutdownGracePeriod=30s`, and `ShutdownGracePeriodCriticalPods=10s`, kubelet will delay the node shutdown by 30 seconds. During the shutdown, the first 20 (30-10) seconds would be reserved for gracefully terminating normal pods, and the last 10 seconds would be reserved for terminating critical pods.
+
+## How can I learn more?
+* Documentation: [https://kubernetes.io/docs/concepts/architecture/nodes/#graceful-node-shutdown](https://kubernetes.io/docs/concepts/architecture/nodes/#graceful-node-shutdown)


Suggested change

* Documentation: [https://kubernetes.io/docs/concepts/architecture/nodes/#graceful-node-shutdown](https://kubernetes.io/docs/concepts/architecture/nodes/#graceful-node-shutdown)

* Read the [documentation](/docs/concepts/architecture/nodes/#graceful-node-shutdown) for graceful node shutdown

* Read the enhancement proposal, [KEP 2000](https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/2000-graceful-node-shutdown)

* View the [code](https://github.com/kubernetes/kubernetes/tree/release-1.21/pkg/kubelet/nodeshutdown) on GitHub

Done (agree, the active action reads better here) :)

sftim · 2021-03-30T22:30:39Z

content/en/blog/_posts/2021-04-21-Graceful-Node-Shutdown-Beta.md

+* View the code: [https://github.com/kubernetes/kubernetes/tree/release-1.20/pkg/kubelet/nodeshutdown](https://github.com/kubernetes/kubernetes/tree/release-1.20/pkg/kubelet/nodeshutdown)
+
+## How do I get involved?
+Your feedback is always welcome! SIG-Node meets regularly and can be reached via Slack and the mailing list.


Suggested change

Your feedback is always welcome! SIG-Node meets regularly and can be reached via Slack and the mailing list.

Your feedback is always welcome! SIG Node meets regularly and can be reached via [Slack](https://slack.k8s.io) (channel `#sig-node`), or the SIG's [mailing list](https://github.com/kubernetes/community/tree/master/sig-node#contact).

sftim · 2021-03-30T22:35:05Z

Because the first word of the article is a typo of “Kubernetes”:
/lgtm cancel

but: reviewers, feel free to re-add LGTM at will as needed.

bobbypage · 2021-04-02T04:18:08Z

Thanks for all the feedback @sftim. I updated based on your feedback and some other comments I received on the original Google doc. Please take a look again.

SergeyKanzhelev · 2021-04-02T19:49:04Z

content/en/blog/_posts/2021-04-21-Graceful-Node-Shutdown-Beta.md

+
+**Authors:** David Porter (Google), Murnal Patel (Red Hat), and Tim Bannister (The Scale Factory)
+
+Graceful node shutdown, beta in 1.21, enables kubelet to gracefully evict pods during a machine shutdown.


nit: terminology: should it be called node?

Suggested change

Graceful node shutdown, beta in 1.21, enables kubelet to gracefully evict pods during a machine shutdown.

Graceful node shutdown, beta in 1.21, enables kubelet to gracefully evict pods during a node shutdown.

SergeyKanzhelev · 2021-04-02T19:53:02Z

content/en/blog/_posts/2021-04-21-Graceful-Node-Shutdown-Beta.md

+
+In our example, the logging DaemonSet would run as a critical pod. During the graceful node shutdown, regular pods are terminated first, followed by critical pods. As an example, this would allow a critical pod associated with a logging daemonset to continue functioning, and collecting logs during the termination of regular pods.
+
+We will evaluate during the beta phase if we need more flexibility for different pod priority classes and add support if needed.


Suggested change

We will evaluate during the beta phase if we need more flexibility for different pod priority classes and add support if needed.

We will evaluate during the beta phase if we need more flexibility for different pod priority classes and add support if needed. Please, tell us about your scenario that may require finer grain configuration.

reworded a bit, but added this note.

SergeyKanzhelev · 2021-04-02T19:56:10Z

content/en/blog/_posts/2021-04-21-Graceful-Node-Shutdown-Beta.md

+Prior to Kubernetes 1.20 (when graceful node shutdown was introduced as an alpha feature), safe node draining was not easy: it required users to manually take action and drain the node beforehand. If someone or something shut down your node without draining it first, most likely your pods would not be safely evicted from your node and shutdown abruptly. Other services talking to those pods might see errors due to the pods exiting abruptly. Some examples of this situation may be caused by a reboot due to security patches or preemption of short lived cloud compute instances.
+
+Kubernetes 1.21 brings graceful node shutdown to beta. Graceful node shutdown gives you more control over some of those unexpected shutdown situations. With graceful node shutdown, the kubelet is aware of underlying system shutdown events and can propagate these events to pods, ensuring containers can shut down as gracefully as possible. This gives the containers a chance to checkpoint their state or release back any resources they are holding.
+


add suggestion from the word doc:

Suggested change

Note, that for the best availability, you still need to design your workload to be resilient for node failures. Graceful node shutdown feature makes it easier to handle semi-planned node termination events, but will not help in case of node failure.

reworded a bit and added.

mrbobbytables · 2021-04-12T18:51:25Z

/approve
/hold
for further review

k8s-ci-robot · 2021-04-12T18:51:44Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mrbobbytables

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~content/en/blog/OWNERS~~ [mrbobbytables]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

mrunalp · 2021-04-13T00:30:46Z

LGTM

sftim · 2021-04-13T19:26:26Z

/hold cancel
/lgtm

Scheduled for 2021-04-21

k8s-ci-robot · 2021-04-13T19:26:36Z

LGTM label has been added.

Git tree hash: 37bd7f52f5f7c35306007094052d27300f5ae7fd

add 1.21 graceful node shutdown blog post

c5bf210

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. area/blog Issues or PRs related to the Kubernetes Blog subproject labels Mar 30, 2021

k8s-ci-robot requested review from kbarnard10 and sftim March 30, 2021 20:13

k8s-ci-robot added language/en Issues or PRs related to English language sig/docs Categorizes an issue or PR as relevant to SIG Docs. labels Mar 30, 2021

onlydole reviewed Mar 30, 2021

View reviewed changes

k8s-ci-robot assigned onlydole Mar 30, 2021

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 30, 2021

sftim reviewed Mar 30, 2021

View reviewed changes

k8s-ci-robot assigned sftim Mar 30, 2021

k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 30, 2021

Address initial feedback

8ecf560

bobbypage force-pushed the blog_node_shutdown branch from dd875f9 to 8ecf560 Compare April 2, 2021 04:17

SergeyKanzhelev reviewed Apr 2, 2021

View reviewed changes

Address final feedback

9829da1

k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 12, 2021

mrbobbytables added the tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. label Apr 12, 2021

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 12, 2021

k8s-ci-robot added lgtm "Looks good to me", indicates that a PR is ready to be merged. and removed do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels Apr 13, 2021

k8s-ci-robot merged commit 7c77d75 into kubernetes:master Apr 13, 2021

pacoxu mentioned this pull request Oct 12, 2024

Graceful node shutdown kubernetes/enhancements#2000

Open

23 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add 1.21 graceful node shutdown blog post #27335

add 1.21 graceful node shutdown blog post #27335

salaxander commented Mar 30, 2021

netlify bot commented Mar 30, 2021 •

edited

Loading

onlydole left a comment

sftim left a comment

sftim Mar 30, 2021 •

edited

Loading

bobbypage Apr 2, 2021

sftim Mar 30, 2021

bobbypage Apr 2, 2021

sftim Mar 30, 2021

bobbypage Apr 2, 2021

sftim Mar 30, 2021

bobbypage Apr 2, 2021

sftim Mar 30, 2021

bobbypage Apr 2, 2021

sftim Mar 30, 2021

sftim Mar 30, 2021 •

edited

Loading

bobbypage Apr 2, 2021

sftim Mar 30, 2021

sftim Mar 30, 2021 •

edited

Loading

bobbypage Apr 2, 2021

sftim Mar 30, 2021

bobbypage Apr 2, 2021

sftim commented Mar 30, 2021

bobbypage commented Apr 2, 2021

SergeyKanzhelev Apr 2, 2021

bobbypage Apr 7, 2021

SergeyKanzhelev Apr 2, 2021

bobbypage Apr 7, 2021

SergeyKanzhelev Apr 2, 2021

bobbypage Apr 7, 2021

mrbobbytables commented Apr 12, 2021

k8s-ci-robot commented Apr 12, 2021

mrunalp commented Apr 13, 2021

sftim commented Apr 13, 2021

k8s-ci-robot commented Apr 13, 2021


		Authors: David Porter (Google), Murnal Patel (Red Hat), and Tim Bannister (The Scale Factory)

		Kuberentes is a distributed system and as such we need to be prepared for inevitable failures — nodes will fail, containers might crash or be restarted, and - ideally - your workloads will be able to withstand these catastrophic events.

	Kuberentes is a distributed system and as such we need to be prepared for inevitable failures — nodes will fail, containers might crash or be restarted, and - ideally - your workloads will be able to withstand these catastrophic events.
	Kubernetes is a distributed system and as such we need to be prepared for inevitable failure. Nodes can fail, containers might crash or be restarted, and - ideally - your workloads will be able to withstand these catastrophic events.


		Kuberentes is a distributed system and as such we need to be prepared for inevitable failures — nodes will fail, containers might crash or be restarted, and - ideally - your workloads will be able to withstand these catastrophic events.

		One of the common classes of issues are workload failures on node shutdown or restart. The best practice prior to bringing your node down is to [safely drain and cordon your node](https://kubernetes.io/docs/tasks/administer-cluster/safely-drain-node/). This will ensure that all pods running on this node can safely be evicted. An eviction will ensure your pods can follow the expected pod [termination lifecycle](https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-termination) meaning receiving a SIGTERM in your container and/or running preStopHooks, etc.

		Unfortunately prior to kubernetes 1.20, safe node draining is not always possible: it requires users to manually take action and drain the node beforehand. If someone or something shuts down your node without a drain beforehand, most likely your pods will not be safely evicted from your node and shutdown abruptly.

		In Kubernetes 1.20 graceful node shutdown was introduced as a new feature in alpha, and later in 1.21 brought to beta. Graceful node shutdown gives you more control over some of those unexpected shutdown situations. With Graceful node shutdown, the kubelet is aware of underlying system shutdown events and can propagate these events to pods, ensuring their containers can shut down as gracefully as possible. This gives the containers a chance to checkpoint their state or release back any resources they are holding.

	* A user or script running `shutdown -h -P now` or `systemctl poweroff`
	* A user or script running commands such as `shutdown -h -P now` or `systemctl reboot`

	* A Preemptible VM or Spot Instances that can be terminated by a cloud provider unexpectedly.
	* A Preemptible VM or Spot Instance that your cloud provider can terminate unexpectedly, but with a brief warning.

	Graceful node shutdown is controlled with the `GracefulNodeShutdown` [feature gate](https://github.com/kubernetes/website/blob/master/docs/reference/command-line-tools-reference/feature-gates) which is enabled by default in 1.21.
	Graceful node shutdown is controlled with the `GracefulNodeShutdown` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/) and is enabled by default in Kubernetes 1.21.

	* Specifies the duration used to terminate critical pods during a node shutdown. This should be less than ShutdownGracePeriod.
	* Specifies the duration used to terminate critical pods during a node shutdown. This should be less than `ShutdownGracePeriod`.

-* Documentation: [https://kubernetes.io/docs/concepts/architecture/nodes/#graceful-node-shutdown](https://kubernetes.io/docs/concepts/architecture/nodes/#graceful-node-shutdown)
+* Read the [documentation](/docs/concepts/architecture/nodes/#graceful-node-shutdown) for graceful node shutdown
+* Read the enhancement proposal, [KEP 2000](https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/2000-graceful-node-shutdown)
+* View the [code](https://github.com/kubernetes/kubernetes/tree/release-1.21/pkg/kubelet/nodeshutdown) on GitHub

	Your feedback is always welcome! SIG-Node meets regularly and can be reached via Slack and the mailing list.
	Your feedback is always welcome! SIG Node meets regularly and can be reached via [Slack](https://slack.k8s.io) (channel `#sig-node`), or the SIG's [mailing list](https://github.com/kubernetes/community/tree/master/sig-node#contact).


		Authors: David Porter (Google), Murnal Patel (Red Hat), and Tim Bannister (The Scale Factory)

		Graceful node shutdown, beta in 1.21, enables kubelet to gracefully evict pods during a machine shutdown.


		In our example, the logging DaemonSet would run as a critical pod. During the graceful node shutdown, regular pods are terminated first, followed by critical pods. As an example, this would allow a critical pod associated with a logging daemonset to continue functioning, and collecting logs during the termination of regular pods.

		We will evaluate during the beta phase if we need more flexibility for different pod priority classes and add support if needed.

		Prior to Kubernetes 1.20 (when graceful node shutdown was introduced as an alpha feature), safe node draining was not easy: it required users to manually take action and drain the node beforehand. If someone or something shut down your node without draining it first, most likely your pods would not be safely evicted from your node and shutdown abruptly. Other services talking to those pods might see errors due to the pods exiting abruptly. Some examples of this situation may be caused by a reboot due to security patches or preemption of short lived cloud compute instances.

		Kubernetes 1.21 brings graceful node shutdown to beta. Graceful node shutdown gives you more control over some of those unexpected shutdown situations. With graceful node shutdown, the kubelet is aware of underlying system shutdown events and can propagate these events to pods, ensuring containers can shut down as gracefully as possible. This gives the containers a chance to checkpoint their state or release back any resources they are holding.



	Note, that for the best availability, you still need to design your workload to be resilient for node failures. Graceful node shutdown feature makes it easier to handle semi-planned node termination events, but will not help in case of node failure.

add 1.21 graceful node shutdown blog post #27335

add 1.21 graceful node shutdown blog post #27335

Conversation

salaxander commented Mar 30, 2021

netlify bot commented Mar 30, 2021 • edited Loading

onlydole left a comment

Choose a reason for hiding this comment

sftim left a comment

Choose a reason for hiding this comment

sftim Mar 30, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sftim Mar 30, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sftim Mar 30, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sftim commented Mar 30, 2021

bobbypage commented Apr 2, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mrbobbytables commented Apr 12, 2021

k8s-ci-robot commented Apr 12, 2021

mrunalp commented Apr 13, 2021

sftim commented Apr 13, 2021

k8s-ci-robot commented Apr 13, 2021

netlify bot commented Mar 30, 2021 •

edited

Loading

sftim Mar 30, 2021 •

edited

Loading

sftim Mar 30, 2021 •

edited

Loading

sftim Mar 30, 2021 •

edited

Loading