-
Notifications
You must be signed in to change notification settings - Fork 561
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
✨feat(awsmachinepool): custom lifecyclehooks for machinepools #4875
base: main
Are you sure you want to change the base?
Conversation
Welcome @sebltm! |
Hi @sebltm. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
I have two requests before getting to the review:
|
/assign |
@AndiDog sorry I hadn't cleaned up the PR, I didn't know if it would get some traction :) |
@AndiDog let me know if this looks good or if there's anything else I should take a look at :) |
The PR is definitely reviewable now. I'm not much experienced with lifecycle hooks and aws-node-termination-handler (is that your actual use case?). Maybe MachinePool machines (#4527) give us a good way to detect node shutdown and have CAPI/CAPA take care of it? Or in other words: I'm not fully confident reviewing here with my knowledge, but maybe others have a better clue – please feel free to ping or discuss in Slack ( |
/ok-to-test |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
f38c302
to
b644e5f
Compare
@AndiDog could you help me find someone to review this PR? I've posted a couple of times in Slack |
config/crd/bases/infrastructure.cluster.x-k8s.io_awsmanagedmachinepools.yaml
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Within my company, we managed today to talk about lifecycle hooks and how they could help with several CAPA features, including aws-node-termination-handler which you mentioned. So I'm feeling up to review it in some detail.
@@ -298,6 +298,21 @@ func (r *AWSMachinePoolReconciler) reconcileNormal(ctx context.Context, machineP | |||
return nil | |||
} | |||
|
|||
lifecycleHookScope, err := scope.NewLifecycleHookScope(scope.LifecycleHookScopeParams{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
machinePoolScope
should instead provide an interface function like func (*FooScope) LifecycleHooks() []AWSLifecycleHook
– instead of introducing a new type that covers both EC2 and EKS based clusters in the same "class"
@@ -163,13 +163,14 @@ func TestAWSMachinePoolReconciler(t *testing.T) { | |||
recorder = record.NewFakeRecorder(2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now, there's no test covering the new functionality.
We need a non-mocked test, see
reconciler.reconcileServiceFactory = nil // use real implementation, but keep EC2 calls mocked (`ec2ServiceFactory`)
below where the actual EC2 calls are tested. The test should cover different situations, such as no hooks exist, all hooks exist, some hooks need an update, there's a hook too much which should be removed, ...
} | ||
for _, hook := range hooks { | ||
found := false | ||
for _, definedHook := range scope.GetLifecycleHooks() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for _, definedHook := range scope.GetLifecycleHooks() { | |
for _, definedHook := range lifecyleHooks { |
} | ||
} | ||
if !found { | ||
scope.Info("Deleting lifecycle hook", "hook", hook.Name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
scope.Info("Deleting lifecycle hook", "hook", hook.Name) | |
scope.Info("Deleting extraneous lifecycle hook", "hook", hook.Name) |
} | ||
} | ||
|
||
conditions.MarkTrue(scope.GetMachinePool(), expinfrav1.LifecycleHookExistsCondition) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also LifecycleHookReadyCondition
? It's never marked as true (or false).
var sSGs = []string{} | ||
sSGs := []string{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Until here, there were quite a few minor, unneeded changes, plus some small improvements, both out of scope for the PR. If you can put the relevant ones into a separate PR and ping me, I'll get them in. Let's please avoid making this already-large PR review slower by this extra content.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry these were auto-linted, I missed removing them from the PR, I'll clean those up
pkg/cloud/services/interfaces.go
Outdated
@@ -50,6 +50,12 @@ type ASGInterface interface { | |||
SuspendProcesses(name string, processes []string) error | |||
ResumeProcesses(name string, processes []string) error | |||
SubnetIDs(scope *scope.MachinePoolScope) ([]string, error) | |||
GetLifecycleHooks(scope scope.LifecycleHookScope) ([]*expinfrav1.AWSLifecycleHook, error) | |||
GetLifecycleHook(scope scope.LifecycleHookScope, hook *expinfrav1.AWSLifecycleHook) (*expinfrav1.AWSLifecycleHook, error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GetLifecycleHook(scope scope.LifecycleHookScope, hook *expinfrav1.AWSLifecycleHook) (*expinfrav1.AWSLifecycleHook, error) | |
GetLifecycleHook(scope scope.LifecycleHookScope, hookName string) (*expinfrav1.AWSLifecycleHook, error) |
(minor)
Adding label Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
@sebltm: The following tests failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
@AndiDog sorry for merging into the PR, but let me know if this approach looks better to you and I’ll clean this up with a rebase |
What type of PR is this?
/kind feature
What this PR does / why we need it:
This PR adds to the v1beta2 definition for the
AWSMachinePool
andAWSManagedMachinePool
with a new fieldlifecycleHooks
which is a list of:The matching webhooks are updated to validate the lifecycle hooks as they are added to the Custom Resource.
The matching reconcilers are updated to enable reconciling those lifecycle hooks: if the lifecycle hook is present in the Custom Resource but not in the cloud, it is created. And if there is a lifecycle hook present in the cloud but not declared in the Custom Resource then it is removed.
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Fixes #4020
AWS supports Lifecycle Hooks before/after performing certain actions on an ASG. For example, before scaling in (removing) a node, the ASG can publish an event in an SQS queue which can them be consumed by the node-termination-handler to ensure its proper removal from Kubernetes (it will cordon, drain the node and wait for a period of time for applications to be removed before allowing the Autoscaling Group to terminate the instance).
This allows Kubernetes or other components to be aware of the node's lifecycle and take appropriate actions
Special notes for your reviewer:
Checklist:
Release note: