-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add out of service taint remediation #86
Conversation
Hi @k-keiichi-rh. Thanks for your PR. I'm waiting for a medik8s member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
385e0b7
to
bb5e5e4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm having a hard time to follow the remdiation flow. I'd suggest to reorder methods, and maybe rename some of them in order to get rid of too many methods with similar names (remediate/remove/deleteResources)?
e.g.:
Reconcile
remediateWithResourceDeletion
removeNodeResourcesByResourceDeletionWrapper
-> deleteResourcesWrapper
(not nice that we need this wrapper, but I have no quick better idea)
removeNodeResourcesByResourceDeletion
-> deleteResources
(much easier and better to understand IMHO)
remediateByAddingOutOfServiceTaint
-> remediateWithOutOfServiceTaint
(align name with remediateWithResourceDeletion
)
removeNodeResourcesByAddingOutOfServiceTaint
-> useOutOfServiceTaint
(much easier and better to understand IMHO)
remediateWithResourceRemoval
thanks a lot for renaming / reordering, much better readable now IMHO 🙂 /ok-to-test |
/retest-required |
1 similar comment
/retest-required |
/test 4.11-openshift-e2e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One nit inside, otherwise lgtm. Needs rebase though.
Signed-off-by: Keiichi Kii <k-keiichi@nec.com> Signed-off-by: Keiichi Kii <kkii@nec.com>
Signed-off-by: Keiichi Kii <k-keiichi@nec.com> Signed-off-by: Keiichi Kii <kkii@nec.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, nice work!
Leaving the final review to @mshitrit
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: k-keiichi-rh, slintes The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
couldn't get rid of the "change requested" state in another way 🤷🏼♂️ /lgtm cancel |
/test 4.13-openshift-e2e |
/lgtm |
Use lease logic from common repo
This PR is adding a new remediation strategy based on kubernetes/enhancements#1116
In k8s 1.26, kubernetes/enhancements#1116 is a Beta feature. This feature will be graduated to GA in k8s 1.27.
This PR is implemented based on the discussion in #17.
The following is the new remediation strategy for the out-of-service taint:
=> This taint expects that the node is in shutdown or power off state (not in the middle of restarting) according to the enhancement
However, SNR works well with this taint as we discussed in [RFC] Support for out-of-service taint #17 (comment).
=> SNR waits for deleting workloads on the failed node with a timer. If the timer is expired, the reconcile method is triggered.
ECOPROJECT-1228