Configurable Machine drain behavior #11240

sbueringer · 2024-09-30T12:26:10Z

Today, when Cluster API deletes a Machine it drains the corresponding Node to ensure all Pods running on the Node have
been gracefully terminated before deleting the corresponding infrastructure. The current drain implementation has
hard-coded rules to decide which Pods should be evicted. This implementation is aligned to kubectl drain (see
Machine deletion process
for more details).

With recent changes in Cluster API, we can now have finer control on the drain process, and thus we propose a new
MachineDrainRule CRD to make the drain rules configurable per Pod. Additionally, we're proposing annotations that
workload cluster admins can add to individual Pods to control their drain behavior.

This would be a huge improvement over the “standard” kubectl drain aligned implementation we have today and help to
solve a family of issues identified when running Cluster API in production.

More details can be found in the proposal PR.

Prior related discussions:

Tasks:

Proposal: 📖 Proposal: MachineDrainRules #11241
⚠️ Machine: ignore attached Volumes referred by pods ignored during drain #11246
✨ Implement MachineDrainRules #11353
🌱 Extend Node drain e2e test to cover MachineDrainRules #11362
🌱 Add feature gate to consider VolumeAttachments when waiting for volume detach #11386
Documentation (book, can be probably mostly taken from the proposal)

Follow-ups:

Consider adding some sort of timeout for individual Pods (e.g. "GracePeriodSeconds")
Standardize skip label for the Kubernetes ecosystem, xrefs:
- 📖 Proposal: MachineDrainRules #11241 (comment)
- Standardize a label to exclude Pods from Node drain kubernetes/kubernetes#127247

k8s-ci-robot added the needs-kind Indicates a PR lacks a `kind/foo` label and requires one. label Sep 30, 2024

sbueringer mentioned this issue Sep 30, 2024

📖 Proposal: MachineDrainRules #11241

Merged

sbueringer added the kind/feature Categorizes issue or PR as related to a new feature. label Oct 2, 2024

k8s-ci-robot removed the needs-kind Indicates a PR lacks a `kind/foo` label and requires one. label Oct 2, 2024

sbueringer self-assigned this Oct 2, 2024

This was referenced Oct 23, 2024

⚠️ Machine: ignore attached Volumes referred by pods ignored during drain #11246

Merged

✨ Implement MachineDrainRules #11353

Merged

This was referenced Nov 5, 2024

🌱 Extend Node drain e2e test to cover MachineDrainRules #11362

Merged

🌱 Add feature gate to consider VolumeAttachments when waiting for volume detach #11386

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Configurable Machine drain behavior #11240

Configurable Machine drain behavior #11240

sbueringer commented Sep 30, 2024 •

edited

Loading

Configurable Machine drain behavior #11240

Configurable Machine drain behavior #11240

Comments

sbueringer commented Sep 30, 2024 • edited Loading

sbueringer commented Sep 30, 2024 •

edited

Loading