-
Notifications
You must be signed in to change notification settings - Fork 3.4k
fix: split pod controller from workflow controller #14129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Alan Clucas <alan@clucas.org>
| go wfc.wfInformer.Run(ctx.Done()) | ||
| go wfc.wftmplInformer.Informer().Run(ctx.Done()) | ||
| go wfc.podInformer.Run(ctx.Done()) | ||
| go wfc.PodController.Run(ctx, 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I presume this was done to ensure that the pods don't get processed by the workers, so that you may reliably inspect the queue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By "this" I assume you mean the 0 for number of workers. Yes.
In the same way the old code created a queue but never actually ran any workers, this means the tests have to manually do processNextItem so they can evaluate the result and inspect.
I'll add a comment.
isubasinghe
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good but I have some questions/issues
| }, | ||
| }, | ||
| ) | ||
| go podController.podInformer.Run(ctx.Done()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this not move to the Run method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, I'll move it.
| func (c *Controller) commonPodEvent(pod *v1.Pod, deleting bool) { | ||
| // All pods here are not marked completed | ||
| action := noAction | ||
| minimumDelay := time.Duration(0) | ||
| podGC := podGCFromPod(pod) | ||
| switch { | ||
| case deleting: | ||
| if hasOurFinalizer(pod.Finalizers) { | ||
| c.log.WithFields(logrus.Fields{"pod.Finalizers": pod.Finalizers}).Info("Removing finalizers during a delete") | ||
| action = removeFinalizer | ||
| minimumDelay = time.Duration(2 * time.Minute) | ||
| } | ||
| case c.podOrphaned(pod): | ||
| if hasOurFinalizer(pod.Finalizers) { | ||
| action = removeFinalizer | ||
| } | ||
| switch { | ||
| case podGC.Strategy == wfv1.PodGCOnWorkflowCompletion: | ||
| case podGC.Strategy == wfv1.PodGCOnPodCompletion: | ||
| case podGC.Strategy == wfv1.PodGCOnPodSuccess && pod.Status.Phase == apiv1.PodSucceeded: | ||
| action = deletePod | ||
| } | ||
| } | ||
| if action != noAction { | ||
| // The workflow is gone, we have no idea when that happened, so lets base around pod transiution | ||
| lastTransition := podLastTransition(pod) | ||
| // GetDeleteDelayDuration returns -1 if no duration, we don't care about failure to parse otherwise here | ||
| delay := time.Duration(0) | ||
| delayDuration, _ := podGC.GetDeleteDelayDuration() | ||
| // In the case of a raw delete make sure we've had some time to process it if there was a finalizer | ||
| if delayDuration < minimumDelay { | ||
| delayDuration = minimumDelay | ||
| } | ||
| if !lastTransition.IsZero() && delayDuration > 0 { | ||
| delay = time.Until(lastTransition.Add(delayDuration)) | ||
| } | ||
| c.log.WithFields(logrus.Fields{"action": action, "namespace": pod.Namespace, "podName": pod.Name, "podGC": podGC}).Info("queuing pod", "delay", delay) | ||
| switch { | ||
| case delay > 0: | ||
| c.queuePodForCleanupAfter(pod.Namespace, pod.Name, action, delay) | ||
| default: | ||
| c.queuePodForCleanup(pod.Namespace, pod.Name, action) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know how I feel about this code and queuePodsForCleanup being relatively similar.
More in the sense that they both drive the reconciliation of pods but live in different packages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We discussed this on a call.
queuePodsForCleanup is the call into the pod-controller to reduce the kubernetes load and make life saner.
Thoughts:
- We must await the node status update to be successfully written to the workflow. This eliminates pure pod observation to know when we can do anything to them, unless we add a lot of extra load to k8s by writing something to the pods so the pod-controller can see the change.
- An option: The workflow informer is also observed by the pod-controller and it then unwraps the workflows to observe node statuses. This is costly in terms of CPU, and may also involve getting this information from the status offload database. Avoided for the double dip into workflow update events as we already know at this point we need to do a thing, and it's unclear that this is much less evil.
queuePodsForCleanupknow which pods are in need of cleanup (by doing the naughty list operation) because they are still in the informer (e.g. not marked complete or deleted) thus minimizing the number needing checking. If we sent this event via a queue into the pod-controller we would have to send the entire set of node statuses and teach the pod-informer to understand node to pod mapping.
| labelSelector := labels.NewSelector(). | ||
| Add(*workflowReq). | ||
| // not sure if we should do this | ||
| Add(*incompleteReq). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should remove this personally, the controllers job is to manage all pods created by argo.
People upgrading into this version might have trouble with pods not being handled with this remaining the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no path whereby the pods have the finalizer and are not labelled as completed. In this state they drop out of view of the informer, yes. Monitoring pods that are marked as completed is a waste of resources, owner reference to the workflow does all the necessary work.
Signed-off-by: Alan Clucas <alan@clucas.org>
Signed-off-by: Alan Clucas <alan@clucas.org>
isubasinghe
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a small comment, good to go after thats been addressed.
| // // log something after calling this function maybe? | ||
| // func podTerminating(pod *v1.Pod) bool { | ||
| // return pod.DeletionTimestamp != nil | ||
| // } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Signed-off-by: Alan Clucas <alan@clucas.org>
Signed-off-by: Alan Clucas <alan@clucas.org> (cherry picked from commit 6c5608e)
…abilities fixes (Cr 28355) (#358) * fix: bump deps for k8schain to fix ecr-login (argoproj#14008) (release-3.6 cherry-pick) (argoproj#14174) * fix(ci): python sdk release process (release-3.6) (argoproj#14183) Signed-off-by: Alan Clucas <alan@clucas.org> * docs: clarify qps/burst on controller (cherry-pick argoproj#14190) (argoproj#14192) Signed-off-by: Tim Collins <tim@thecollins.team> Co-authored-by: Tim Collins <45351296+tico24@users.noreply.github.com> * fix(api/jsonschema): use unchanging JSON Schema version (cherry-pick argoproj#14092) (argoproj#14256) Signed-off-by: Roger Peppe <rogpeppe@gmail.com> Co-authored-by: Roger Peppe <rogpeppe@gmail.com> * fix(api/jsonschema): use working `$id` (cherry-pick argoproj#14257) (argoproj#14258) Signed-off-by: Roger Peppe <rogpeppe@gmail.com> Co-authored-by: Roger Peppe <rogpeppe@gmail.com> * docs: autogenerate tested k8s versions and centralize config (argoproj#14176) (release-3.6) (argoproj#14262) Signed-off-by: Mason Malone <651224+MasonM@users.noreply.github.com> Signed-off-by: Alan Clucas <alan@clucas.org> Co-authored-by: Mason Malone <651224+MasonM@users.noreply.github.com> * chore(deps): bump minio-go to newer version (argoproj#14185) (release-3.6) (argoproj#14261) Co-authored-by: Vaibhav Kaushik <vaibhavkaushik@salesforce.com> * fix: split pod controller from workflow controller (argoproj#14129) (release-3.6) (argoproj#14263) * chore(deps): fix snyk (argoproj#14264) (release-3.6) (argoproj#14268) * chore: revert to correct k8s versions Accidental bump from argoproj#14176 cherry-pick Signed-off-by: Alan Clucas <alan@clucas.org> * chore(deps): bump github.com/go-jose/go-jose/v3 from 3.0.3 to 3.0.4 in the go_modules group (cherry-pick argoproj#14231) (argoproj#14269) Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * fix: wait for workflow informer to sync before pod informer (cherry-pick argoproj#14248) (argoproj#14266) Signed-off-by: Rohan K <rohankmr414@gmail.com> Co-authored-by: Rohan K <rohankmr414@gmail.com> * fix(cli): remove red from log colour selection. Fixes argoproj#6740 (cherry-pick argoproj#14215) (argoproj#14278) Signed-off-by: Prabakaran Kumaresshan <4676330+nixphix@users.noreply.github.com> Co-authored-by: Prabakaran Kumaresshan <4676330+nixphix@users.noreply.github.com> * fix: correct semaphore configmap keys for multiple semaphores (argoproj#14184) (release-3.6) (argoproj#14281) * fix: don't print help for non-validation errors. Fixes argoproj#14234 (cherry-pick argoproj#14249) (argoproj#14283) Signed-off-by: Koichi Shimada <jumpe1programming@gmail.com> Signed-off-by: Mason Malone <651224+MasonM@users.noreply.github.com> Co-authored-by: koichi <51446844+jumpe1@users.noreply.github.com> Co-authored-by: Mason Malone <651224+MasonM@users.noreply.github.com> * docs: fix kubernetes versions (release-3.6) (argoproj#14273) Signed-off-by: Alan Clucas <alan@clucas.org> * fix(workflow/sync): use RWMutex to prevent concurrent map access (cherry-pick argoproj#14321) (argoproj#14322) Signed-off-by: Ryan Currah <ryan@currah.ca> Co-authored-by: Ryan Currah <ryan@currah.ca> * chore(lint): update golangci-lint to 2.1.1 (argoproj#14390) (cherry-pick release-3.6) (argoproj#14417) * chore: bump golang 1.23->1.24 (argoproj#14385) (cherry-pick release-3.6) (argoproj#14418) * fix: gracefully handle invalid CronWorkflows and simplify logic. (cherry-pick argoproj#14197) (argoproj#14419) Signed-off-by: Mason Malone <651224+MasonM@users.noreply.github.com> * fix: prevent dfs sorter infinite recursion on cycle. Fixes argoproj#13395 (cherry-pick argoproj#14391) (argoproj#14420) Signed-off-by: Adrien Delannoy <a.delannoyfr@gmail.com> Co-authored-by: Adrien Delannoy <a.delannoyfr@gmail.com> * chore(deps): bump github.com/expr-lang/expr from 1.16.9 to 1.17.0 (argoproj#14307) (release-3.6) (argoproj#14421) Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * chore(deps)!: update k8s and argo-events (release-3.6) (argoproj#14424) Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: william.vanhevelingen <william.vanhevelingen@acquia.com> Signed-off-by: Mason Malone <651224+MasonM@users.noreply.github.com> Signed-off-by: William Van Hevelingen <william.vanhevelingen@acquia.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: William Van Hevelingen <William.VanHevelingen@acquia.com> Co-authored-by: Mason Malone <651224+MasonM@users.noreply.github.com> * fix: correct retry logic (argoproj#13734) (release-3.6) (argoproj#14428) Signed-off-by: isubasinghe <isitha@pipekit.io> Signed-off-by: Alan Clucas <alan@clucas.org> Co-authored-by: Isitha Subasinghe <isitha@pipekit.io> * fix: manual retries exit handler cleanup. Fixes argoproj#14180 (argoproj#14181) (release-3.6) (argoproj#14429) Signed-off-by: isubasinghe <isitha@pipekit.io> Signed-off-by: Alan Clucas <alan@clucas.org> Co-authored-by: Isitha Subasinghe <isitha@pipekit.io> * fix: correct manual retry logic. Fixes argoproj#14124 (argoproj#14328) (release-3.6) (argoproj#14430) Signed-off-by: oninowang <oninowang@tencent.com> Signed-off-by: Alan Clucas <alan@clucas.org> Co-authored-by: jswxstw <jswxstw@gmail.com> * fix: disable ALPN in argo-server as a workaround (argoproj#14433) Signed-off-by: Alan Clucas <alan@clucas.org> * result of codegen Signed-off-by: Kim <kim.aharfi@codefresh.io> * fix:lint Signed-off-by: Kim <kim.aharfi@codefresh.io> --------- Signed-off-by: Alan Clucas <alan@clucas.org> Signed-off-by: Tim Collins <tim@thecollins.team> Signed-off-by: Roger Peppe <rogpeppe@gmail.com> Signed-off-by: Mason Malone <651224+MasonM@users.noreply.github.com> Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: Rohan K <rohankmr414@gmail.com> Signed-off-by: Prabakaran Kumaresshan <4676330+nixphix@users.noreply.github.com> Signed-off-by: Koichi Shimada <jumpe1programming@gmail.com> Signed-off-by: Ryan Currah <ryan@currah.ca> Signed-off-by: Adrien Delannoy <a.delannoyfr@gmail.com> Signed-off-by: william.vanhevelingen <william.vanhevelingen@acquia.com> Signed-off-by: William Van Hevelingen <william.vanhevelingen@acquia.com> Signed-off-by: isubasinghe <isitha@pipekit.io> Signed-off-by: oninowang <oninowang@tencent.com> Signed-off-by: Kim <kim.aharfi@codefresh.io> Co-authored-by: Alan Clucas <alan@clucas.org> Co-authored-by: gcp-cherry-pick-bot[bot] <98988430+gcp-cherry-pick-bot[bot]@users.noreply.github.com> Co-authored-by: Tim Collins <45351296+tico24@users.noreply.github.com> Co-authored-by: Roger Peppe <rogpeppe@gmail.com> Co-authored-by: Mason Malone <651224+MasonM@users.noreply.github.com> Co-authored-by: Vaibhav Kaushik <vaibhavkaushik@salesforce.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Rohan K <rohankmr414@gmail.com> Co-authored-by: Prabakaran Kumaresshan <4676330+nixphix@users.noreply.github.com> Co-authored-by: koichi <51446844+jumpe1@users.noreply.github.com> Co-authored-by: Ryan Currah <ryan@currah.ca> Co-authored-by: Adrien Delannoy <a.delannoyfr@gmail.com> Co-authored-by: William Van Hevelingen <William.VanHevelingen@acquia.com> Co-authored-by: Isitha Subasinghe <isitha@pipekit.io> Co-authored-by: jswxstw <jswxstw@gmail.com>
Superceeds #14088
Motivation
Split the pod (formerly pod GC) controller from the main workflow controller to make it easier to reason about.
Attempt to handle various conditions whereby pods can become orphaned when pod finalizers are in use.
Modifications
Split the pod controller components and its queue into the workflows/controller/pod subdirectory.
Callbacks into the workflow controller are provided so that pod events can enqueue workflows for reconciliation.
The pod controller can also now make decisions on it's own. This is especially important for orphaned pods where the workflow has undergone deletion without the pod finalizers being removed. The most obvious case here is a PodGC strategy with delete delay duration. In this case the only place the existence of the pod enqueued for deletion is in the pod GC delayed queue, which is in memory and subject to a workflow-controller restart losing track of it.
Added a number of testing hooks so that the workflow/controller tests didn't need rewriting where they need to inspect the internals of the pod-controller.
Verification
Existing E2E and unit testing, and a few extensions.
Documentation
This is mostly a refactor and amendment of code to accommodate more configurations.