-
Notifications
You must be signed in to change notification settings - Fork 14.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add 1.21 graceful node shutdown blog post #27335
add 1.21 graceful node shutdown blog post #27335
Conversation
Deploy preview for kubernetes-io-master-staging ready! Built with commit 9829da1 https://deploy-preview-27335--kubernetes-io-master-staging.netlify.app |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - thank you, @salaxander
/lgtm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Warning: plentiful feedback.
The text changes I'm proposing are suggestions, not instructions. Please feel free to take the changes that you think improve the article.
We do strongly prefer site relative hyperlinks: /docs/tasks/administer-cluster/guaranteed-scheduling-critical-addon-pods/#marking-pod-as-critical
not https://kubernetes.io/docs/tasks/administer-cluster/guaranteed-scheduling-critical-addon-pods/#marking-pod-as-critical
(and one day we might get everything lint-able to enforce that).
Site-relative links help in a couple of ways including if anyone is borrowing these docs for a different project - we're open source, after all. For the blog it's not essential but I'd prefer to follow the convention.
Thanks @salaxander and I hope you find this review helpful. If you think I've misunderstood anything, have any concerns, or just have questions, let me know!
|
||
**Authors:** David Porter (Google), Murnal Patel (Red Hat), and Tim Bannister (The Scale Factory) | ||
|
||
Kuberentes is a distributed system and as such we need to be prepared for inevitable failures — nodes will fail, containers might crash or be restarted, and - ideally - your workloads will be able to withstand these catastrophic events. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kuberentes is a distributed system and as such we need to be prepared for inevitable failures — nodes will fail, containers might crash or be restarted, and - ideally - your workloads will be able to withstand these catastrophic events. | |
Kubernetes is a distributed system and as such we need to be prepared for inevitable failure. Nodes can fail, containers might crash or be restarted, and - ideally - your workloads will be able to withstand these catastrophic events. |
or, just fixing the typo:
Kuberentes is a distributed system and as such we need to be prepared for inevitable failures — nodes will fail, containers might crash or be restarted, and - ideally - your workloads will be able to withstand these catastrophic events. | |
Kubernetes is a distributed system and as such we need to be prepared for inevitable failures — nodes will fail, containers might crash or be restarted, and - ideally - your workloads will be able to withstand these catastrophic events. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
|
||
Kuberentes is a distributed system and as such we need to be prepared for inevitable failures — nodes will fail, containers might crash or be restarted, and - ideally - your workloads will be able to withstand these catastrophic events. | ||
|
||
One of the common classes of issues are workload failures on node shutdown or restart. The best practice prior to bringing your node down is to [safely drain and cordon your node](https://kubernetes.io/docs/tasks/administer-cluster/safely-drain-node/). This will ensure that all pods running on this node can safely be evicted. An eviction will ensure your pods can follow the expected pod [termination lifecycle](https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-termination) meaning receiving a SIGTERM in your container and/or running preStopHooks, etc. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One of the common classes of issues are workload failures on node shutdown or restart. The best practice prior to bringing your node down is to [safely drain and cordon your node](https://kubernetes.io/docs/tasks/administer-cluster/safely-drain-node/). This will ensure that all pods running on this node can safely be evicted. An eviction will ensure your pods can follow the expected pod [termination lifecycle](https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-termination) meaning receiving a SIGTERM in your container and/or running preStopHooks, etc. | |
One of the common classes of issues are workload failures on node shutdown or restart. The best practice prior to bringing your node down is to [safely drain and cordon your node](/docs/tasks/administer-cluster/safely-drain-node/). This will ensure that all pods running on this node can safely be evicted. An eviction will ensure your pods can follow the expected pod [termination lifecycle](/docs/concepts/workloads/pods/pod-lifecycle/#pod-termination) meaning receiving a SIGTERM in your container and/or running `preStopHooks`, etc. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Unfortunately prior to kubernetes 1.20, safe node draining is not always possible: it requires users to manually take action and drain the node beforehand. If someone or something shuts down your node without a drain beforehand, most likely your pods will not be safely evicted from your node and shutdown abruptly. | ||
|
||
In Kubernetes 1.20 graceful node shutdown was introduced as a new feature in alpha, and later in 1.21 brought to beta. Graceful node shutdown gives you more control over some of those unexpected shutdown situations. With Graceful node shutdown, the kubelet is aware of underlying system shutdown events and can propagate these events to pods, ensuring their containers can shut down as gracefully as possible. This gives the containers a chance to checkpoint their state or release back any resources they are holding. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately prior to kubernetes 1.20, safe node draining is not always possible: it requires users to manually take action and drain the node beforehand. If someone or something shuts down your node without a drain beforehand, most likely your pods will not be safely evicted from your node and shutdown abruptly. | |
In Kubernetes 1.20 graceful node shutdown was introduced as a new feature in alpha, and later in 1.21 brought to beta. Graceful node shutdown gives you more control over some of those unexpected shutdown situations. With Graceful node shutdown, the kubelet is aware of underlying system shutdown events and can propagate these events to pods, ensuring their containers can shut down as gracefully as possible. This gives the containers a chance to checkpoint their state or release back any resources they are holding. | |
Prior to Kubernetes 1.20 (when_graceful node shutdown_ was introduced as an alpha feature), safe node draining was not easy: it required users to manually take action and drain the node beforehand. If someone or something shut down your node without draining it first, most likely your pods would not be safely evicted from your node. Other services talking to those pods might see errors when they exited abruptly. | |
Kubernetes v1.21 moves safe node draining to beta, which means it's enabled by default. Graceful node shutdown gives you more control over some of those unexpected shutdown situations. With graceful node shutdown, the kubelet is aware of underlying system shutdown events and can propagate these events to pods, ensuring their containers can shut down as gracefully as possible. This gives the containers a chance to checkpoint their state or release back any resources they are holding. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
|
||
## How does it work? | ||
On Linux, your system can shut down in many different situations. For example: | ||
* A user or script running `shutdown -h -P now` or `systemctl poweroff` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case the reader thinks this is item only about power-off actions specifically:
* A user or script running `shutdown -h -P now` or `systemctl poweroff` | |
* A user or script running commands such as `shutdown -h -P now` or `systemctl reboot` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
* A user or script running `shutdown -h -P now` or `systemctl poweroff` | ||
* Physically pressing a power button on the machine. | ||
* Stopping a VM instance on a cloud provider, e.g. `gcloud compute instances stop` on GCP. | ||
* A Preemptible VM or Spot Instances that can be terminated by a cloud provider unexpectedly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* A Preemptible VM or Spot Instances that can be terminated by a cloud provider unexpectedly. | |
* A Preemptible VM or Spot Instance that your cloud provider can terminate unexpectedly, but with a brief warning. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
One important consideration we took when designing this feature is that not all pods are created equal. For example, some of the pods running on a node such as a logging related daemonset should stay running as long as possible to capture important logs during the shutdown itself. As a result, pods are split into two categories: “regular” and “critical”. [Critical pods](https://kubernetes.io/docs/tasks/administer-cluster/guaranteed-scheduling-critical-addon-pods/#marking-pod-as-critical) are those that have `priorityClassName` set to `system-cluster-critical` or `system-node-critical`; all other pods are considered regular. In our example, the logging DaemonSet would run as a critical pod. During the graceful shutdown, regular pods are terminated first, followed by critical pods. As an example, this would allow a critical pod like those part of a logging daemonset to continue functioning, and collecting logs during the termination of regular pods. | ||
|
||
## How do I use it? | ||
Graceful node shutdown is controlled with the `GracefulNodeShutdown` [feature gate](https://github.com/kubernetes/website/blob/master/docs/reference/command-line-tools-reference/feature-gates) which is enabled by default in 1.21. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Graceful node shutdown is controlled with the `GracefulNodeShutdown` [feature gate](https://github.com/kubernetes/website/blob/master/docs/reference/command-line-tools-reference/feature-gates) which is enabled by default in 1.21. | |
Graceful node shutdown is controlled with the `GracefulNodeShutdown` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/) and is enabled by default in Kubernetes 1.21. |
## How do I use it? | ||
Graceful node shutdown is controlled with the `GracefulNodeShutdown` [feature gate](https://github.com/kubernetes/website/blob/master/docs/reference/command-line-tools-reference/feature-gates) which is enabled by default in 1.21. | ||
|
||
Graceful node shutdown feature is configured with two kubelet configuration options: `ShutdownGracePeriod`, `ShutdownGracePeriodCriticalPods`. These can be adjusted by editing the kubelet config file that is passed to kubelet via the `--config` flag; for more details, refer to [set kubelet parameters via a configuration file](https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Graceful node shutdown feature is configured with two kubelet configuration options: `ShutdownGracePeriod`, `ShutdownGracePeriodCriticalPods`. These can be adjusted by editing the kubelet config file that is passed to kubelet via the `--config` flag; for more details, refer to [set kubelet parameters via a configuration file](https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/). | |
You can configure the graceful shutdown behavior using two kubelet configuration options: `ShutdownGracePeriod` and `ShutdownGracePeriodCriticalPods`. To change these, you edit the kubelet configuration file that is passed to kubelet via the `--config` flag; for more details, refer to [Set kubelet parameters via a configuration file](/docs/tasks/administer-cluster/kubelet-config-file/). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
* `ShutdownGracePeriod` | ||
* Specifies the total duration that the node should delay the shutdown by. This is the total grace period for pod termination for both regular and critical pods. | ||
* `ShutdownGracePeriodCriticalPods` | ||
* Specifies the duration used to terminate critical pods during a node shutdown. This should be less than ShutdownGracePeriod. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* Specifies the duration used to terminate critical pods during a node shutdown. This should be less than ShutdownGracePeriod. | |
* Specifies the duration used to terminate critical pods during a node shutdown. This should be less than `ShutdownGracePeriod`. |
For example, if `ShutdownGracePeriod=30s`, and `ShutdownGracePeriodCriticalPods=10s`, kubelet will delay the node shutdown by 30 seconds. During the shutdown, the first 20 (30-10) seconds would be reserved for gracefully terminating normal pods, and the last 10 seconds would be reserved for terminating critical pods. | ||
|
||
## How can I learn more? | ||
* Documentation: [https://kubernetes.io/docs/concepts/architecture/nodes/#graceful-node-shutdown](https://kubernetes.io/docs/concepts/architecture/nodes/#graceful-node-shutdown) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* Documentation: [https://kubernetes.io/docs/concepts/architecture/nodes/#graceful-node-shutdown](https://kubernetes.io/docs/concepts/architecture/nodes/#graceful-node-shutdown) | |
* Read the [documentation](/docs/concepts/architecture/nodes/#graceful-node-shutdown) for graceful node shutdown | |
* Read the enhancement proposal, [KEP 2000](https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/2000-graceful-node-shutdown) | |
* View the [code](https://github.com/kubernetes/kubernetes/tree/release-1.21/pkg/kubelet/nodeshutdown) on GitHub |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done (agree, the active action reads better here) :)
* View the code: [https://github.com/kubernetes/kubernetes/tree/release-1.20/pkg/kubelet/nodeshutdown](https://github.com/kubernetes/kubernetes/tree/release-1.20/pkg/kubelet/nodeshutdown) | ||
|
||
## How do I get involved? | ||
Your feedback is always welcome! SIG-Node meets regularly and can be reached via Slack and the mailing list. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your feedback is always welcome! SIG-Node meets regularly and can be reached via Slack and the mailing list. | |
Your feedback is always welcome! SIG Node meets regularly and can be reached via [Slack](https://slack.k8s.io) (channel `#sig-node`), or the SIG's [mailing list](https://github.com/kubernetes/community/tree/master/sig-node#contact). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Because the first word of the article is a typo of “Kubernetes”: but: reviewers, feel free to re-add LGTM at will as needed. |
dd875f9
to
8ecf560
Compare
Thanks for all the feedback @sftim. I updated based on your feedback and some other comments I received on the original Google doc. Please take a look again. |
|
||
**Authors:** David Porter (Google), Murnal Patel (Red Hat), and Tim Bannister (The Scale Factory) | ||
|
||
Graceful node shutdown, beta in 1.21, enables kubelet to gracefully evict pods during a machine shutdown. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: terminology: should it be called node?
Graceful node shutdown, beta in 1.21, enables kubelet to gracefully evict pods during a machine shutdown. | |
Graceful node shutdown, beta in 1.21, enables kubelet to gracefully evict pods during a node shutdown. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
|
||
In our example, the logging DaemonSet would run as a critical pod. During the graceful node shutdown, regular pods are terminated first, followed by critical pods. As an example, this would allow a critical pod associated with a logging daemonset to continue functioning, and collecting logs during the termination of regular pods. | ||
|
||
We will evaluate during the beta phase if we need more flexibility for different pod priority classes and add support if needed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will evaluate during the beta phase if we need more flexibility for different pod priority classes and add support if needed. | |
We will evaluate during the beta phase if we need more flexibility for different pod priority classes and add support if needed. Please, tell us about your scenario that may require finer grain configuration. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reworded a bit, but added this note.
Prior to Kubernetes 1.20 (when graceful node shutdown was introduced as an alpha feature), safe node draining was not easy: it required users to manually take action and drain the node beforehand. If someone or something shut down your node without draining it first, most likely your pods would not be safely evicted from your node and shutdown abruptly. Other services talking to those pods might see errors due to the pods exiting abruptly. Some examples of this situation may be caused by a reboot due to security patches or preemption of short lived cloud compute instances. | ||
|
||
Kubernetes 1.21 brings graceful node shutdown to beta. Graceful node shutdown gives you more control over some of those unexpected shutdown situations. With graceful node shutdown, the kubelet is aware of underlying system shutdown events and can propagate these events to pods, ensuring containers can shut down as gracefully as possible. This gives the containers a chance to checkpoint their state or release back any resources they are holding. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add suggestion from the word doc:
Note, that for the best availability, you still need to design your workload to be resilient for node failures. Graceful node shutdown feature makes it easier to handle semi-planned node termination events, but will not help in case of node failure. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reworded a bit and added.
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: mrbobbytables The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
LGTM |
/hold cancel Scheduled for 2021-04-21 |
LGTM label has been added. Git tree hash: 37bd7f52f5f7c35306007094052d27300f5ae7fd
|
This is a 1.21 feature blog for graceful node shutdown (https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/2000-graceful-node-shutdown).
Target publish date is 04/21/21.
cc @bobbypage @divya-mohan0209 @sftim @mrunalp