-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add CP node deletion param configurable #6571
base: master
Are you sure you want to change the base?
Add CP node deletion param configurable #6571
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see maxCloudProviderNodeDeletionTime
propagated into / captured in AutoscalingOptions
?
@gandhipr, please also update the user-facing section in the description. I am pretty sure it becomes part of the release notes, so needs to document the new flag (even if it has a default value that preserves current behavior). |
cc5b031
to
dabc007
Compare
dabc007
to
4c0950e
Compare
/test ls |
@jackfrancis: The specified target(s) for
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/test pull-cluster-autoscaler-e2e-azure |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
This seems sensible to me.
cc @MaciekPytel @gjtempleton @elmiko
I was going to request a helm chart update to include this new config option, but I noticed that there are a bunch of options not covered by the chart. I can do a follow-up PR to do all of that work inclusive of this new change.
Also is there some additional documentation reinforcement we can do to ensure that the configurable surface area is fully documented, I see gaps there as well.
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: gandhipr, jackfrancis, rakechill, tallaxes The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
ping @MaciekPytel @gjtempleton @towca for approval as this touches the general cloudprovider code (not just Azure) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like I'm maybe missing something here, the PR description describes allowing cloud-providers to set this as necessary, however it's being exposed as a user configurable flag (and the mention of following up with helm chart changes backs this up).
What's the motivation here? Users with nodes which may require extended times for safe node shutdowns?
frequentLoopsEnabled = flag.Bool("frequent-loops-enabled", false, "Whether clusterautoscaler triggers new iterations more frequently when it's needed") | ||
provisioningRequestsEnabled = flag.Bool("enable-provisioning-requests", false, "Whether the clusterautoscaler will be handling the ProvisioningRequest CRs.") | ||
frequentLoopsEnabled = flag.Bool("frequent-loops-enabled", false, "Whether clusterautoscaler triggers new iterations more frequently when it's needed") | ||
maxCloudProviderNodeDeletionTime = flag.Duration("max-cloud-provider-node-deletion-time", 5*time.Minute, "Maximum time needed by cloud provider to delete a node.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're adding this in as a new flag we should update the ridiculously long table of options in the FAQ as well.
(We should really be auto-generating that, have raised an issue in the hope someone picks that up.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this make sense to me, but i agree with @gjtempleton 's comments about the description and the updates to the FAQ.
frequentLoopsEnabled = flag.Bool("frequent-loops-enabled", false, "Whether clusterautoscaler triggers new iterations more frequently when it's needed") | ||
provisioningRequestsEnabled = flag.Bool("enable-provisioning-requests", false, "Whether the clusterautoscaler will be handling the ProvisioningRequest CRs.") | ||
frequentLoopsEnabled = flag.Bool("frequent-loops-enabled", false, "Whether clusterautoscaler triggers new iterations more frequently when it's needed") | ||
maxCloudProviderNodeDeletionTime = flag.Duration("max-cloud-provider-node-deletion-time", 5*time.Minute, "Maximum time needed by cloud provider to delete a node.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
thx @gjtempleton @elmiko
Right, I think the triggers that occur after node deletion timeout occurs don't cover the rare edge case behaviors where infra + node deletion takes longer than 5 mins, so making this configurable enables providers to tweak that for certain use cases. Sounds like the action items are:
I'm not sure if @gandhipr is able to move this forward, if that's correct we'll move this to a new PR w/ the above feedback incorporated. Stay tuned! |
PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
What type of PR is this?
feature
What this PR does / why we need it:
Making cloud-provider node deletion param configurable so all cloud-providers can set this as per their need if needed.
Which issue(s) this PR fixes:
Fixes #
Special notes for your reviewer:
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: