-
Notifications
You must be signed in to change notification settings - Fork 810
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a Pod Disruption Budget for the CSI Controller #612
Conversation
Hi @risinger. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: risinger The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Pull Request Test Coverage Report for Build 1377
💛 - Coveralls |
Having a PDB makes sense to me, but I'll defer this to @wongma7 and @leakingtapan who maintain the chart. /ok-to-test |
Can I get your opinions, @wongma7 @leakingtapan? |
Would adding this to the kustomization base be out of scope of this PR? |
@risinger: PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Yeah it does. |
Is this a bug fix or adding new feature?
Could be considered either.
What is this PR about? / Why do we need it?
The Cluster Autoscaler cannot scale down a node where the CSI controller is scheduled, because it is a non-daemonset in kube-system (assuming one uses the namespace specified in the README install instructions). With this chart-specific PR, the CSI controller pod can be rescheduled to another node, allowing the node on which it was previously scheduled to scale down.
What testing is done?
With the current master version of the chart, I ran 3 nodes, got CA to attempt to scale down a node with the CSI controller on it, and saw the scale down hang indefinitely.
I then templated this PDB and installed it. I saw the pod reschedule to another node. CA successfully scaled down the old node.
Here's a run of the CA loop, which repeated consistently before installing the PDB:
Note this only works in conjunction with #526 because the
ToBeDeletedByClusterAutoscaler=:NoSchedule
taint used by CA is tolerated in the current version of the chart.I chose the PDB approach to make sure we have at least one Controller pod running at a time. The
"cluster-autoscaler.kubernetes.io/safe-to-evict": "true"
label approach could potentially create downtime.