Skip to content

Conversation

@Joibel
Copy link
Member

@Joibel Joibel commented Jan 27, 2025

Superceeds #14088

Motivation

Split the pod (formerly pod GC) controller from the main workflow controller to make it easier to reason about.

Attempt to handle various conditions whereby pods can become orphaned when pod finalizers are in use.

Modifications

Split the pod controller components and its queue into the workflows/controller/pod subdirectory.

Callbacks into the workflow controller are provided so that pod events can enqueue workflows for reconciliation.

The pod controller can also now make decisions on it's own. This is especially important for orphaned pods where the workflow has undergone deletion without the pod finalizers being removed. The most obvious case here is a PodGC strategy with delete delay duration. In this case the only place the existence of the pod enqueued for deletion is in the pod GC delayed queue, which is in memory and subject to a workflow-controller restart losing track of it.

Added a number of testing hooks so that the workflow/controller tests didn't need rewriting where they need to inspect the internals of the pod-controller.

Verification

Existing E2E and unit testing, and a few extensions.

Documentation

This is mostly a refactor and amendment of code to accommodate more configurations.

Signed-off-by: Alan Clucas <alan@clucas.org>
@Joibel Joibel requested a review from isubasinghe January 27, 2025 16:40
go wfc.wfInformer.Run(ctx.Done())
go wfc.wftmplInformer.Informer().Run(ctx.Done())
go wfc.podInformer.Run(ctx.Done())
go wfc.PodController.Run(ctx, 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I presume this was done to ensure that the pods don't get processed by the workers, so that you may reliably inspect the queue?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By "this" I assume you mean the 0 for number of workers. Yes.

In the same way the old code created a queue but never actually ran any workers, this means the tests have to manually do processNextItem so they can evaluate the result and inspect.

I'll add a comment.

Copy link
Member

@isubasinghe isubasinghe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good but I have some questions/issues

},
},
)
go podController.podInformer.Run(ctx.Done())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this not move to the Run method?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I'll move it.

Comment on lines +194 to +236
func (c *Controller) commonPodEvent(pod *v1.Pod, deleting bool) {
// All pods here are not marked completed
action := noAction
minimumDelay := time.Duration(0)
podGC := podGCFromPod(pod)
switch {
case deleting:
if hasOurFinalizer(pod.Finalizers) {
c.log.WithFields(logrus.Fields{"pod.Finalizers": pod.Finalizers}).Info("Removing finalizers during a delete")
action = removeFinalizer
minimumDelay = time.Duration(2 * time.Minute)
}
case c.podOrphaned(pod):
if hasOurFinalizer(pod.Finalizers) {
action = removeFinalizer
}
switch {
case podGC.Strategy == wfv1.PodGCOnWorkflowCompletion:
case podGC.Strategy == wfv1.PodGCOnPodCompletion:
case podGC.Strategy == wfv1.PodGCOnPodSuccess && pod.Status.Phase == apiv1.PodSucceeded:
action = deletePod
}
}
if action != noAction {
// The workflow is gone, we have no idea when that happened, so lets base around pod transiution
lastTransition := podLastTransition(pod)
// GetDeleteDelayDuration returns -1 if no duration, we don't care about failure to parse otherwise here
delay := time.Duration(0)
delayDuration, _ := podGC.GetDeleteDelayDuration()
// In the case of a raw delete make sure we've had some time to process it if there was a finalizer
if delayDuration < minimumDelay {
delayDuration = minimumDelay
}
if !lastTransition.IsZero() && delayDuration > 0 {
delay = time.Until(lastTransition.Add(delayDuration))
}
c.log.WithFields(logrus.Fields{"action": action, "namespace": pod.Namespace, "podName": pod.Name, "podGC": podGC}).Info("queuing pod", "delay", delay)
switch {
case delay > 0:
c.queuePodForCleanupAfter(pod.Namespace, pod.Name, action, delay)
default:
c.queuePodForCleanup(pod.Namespace, pod.Name, action)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know how I feel about this code and queuePodsForCleanup being relatively similar.
More in the sense that they both drive the reconciliation of pods but live in different packages.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed this on a call.

queuePodsForCleanup is the call into the pod-controller to reduce the kubernetes load and make life saner.
Thoughts:

  • We must await the node status update to be successfully written to the workflow. This eliminates pure pod observation to know when we can do anything to them, unless we add a lot of extra load to k8s by writing something to the pods so the pod-controller can see the change.
  • An option: The workflow informer is also observed by the pod-controller and it then unwraps the workflows to observe node statuses. This is costly in terms of CPU, and may also involve getting this information from the status offload database. Avoided for the double dip into workflow update events as we already know at this point we need to do a thing, and it's unclear that this is much less evil.
  • queuePodsForCleanup know which pods are in need of cleanup (by doing the naughty list operation) because they are still in the informer (e.g. not marked complete or deleted) thus minimizing the number needing checking. If we sent this event via a queue into the pod-controller we would have to send the entire set of node statuses and teach the pod-informer to understand node to pod mapping.

labelSelector := labels.NewSelector().
Add(*workflowReq).
// not sure if we should do this
Add(*incompleteReq).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should remove this personally, the controllers job is to manage all pods created by argo.
People upgrading into this version might have trouble with pods not being handled with this remaining the same.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no path whereby the pods have the finalizer and are not labelled as completed. In this state they drop out of view of the informer, yes. Monitoring pods that are marked as completed is a waste of resources, owner reference to the workflow does all the necessary work.

Signed-off-by: Alan Clucas <alan@clucas.org>
Signed-off-by: Alan Clucas <alan@clucas.org>
@Joibel Joibel marked this pull request as ready for review January 29, 2025 15:29
Copy link
Member

@isubasinghe isubasinghe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a small comment, good to go after thats been addressed.

Comment on lines 141 to 144
// // log something after calling this function maybe?
// func podTerminating(pod *v1.Pod) bool {
// return pod.DeletionTimestamp != nil
// }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be removed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Signed-off-by: Alan Clucas <alan@clucas.org>
@Joibel Joibel enabled auto-merge (squash) February 13, 2025 10:56
@Joibel Joibel merged commit 6c5608e into argoproj:main Feb 13, 2025
31 checks passed
Joibel added a commit that referenced this pull request Mar 6, 2025
Signed-off-by: Alan Clucas <alan@clucas.org>
(cherry picked from commit 6c5608e)
kim-codefresh added a commit to codefresh-io/argo-workflows that referenced this pull request May 20, 2025
…abilities fixes (Cr 28355) (#358)

* fix: bump deps for k8schain to fix ecr-login (argoproj#14008) (release-3.6 cherry-pick) (argoproj#14174)

* fix(ci): python sdk release process (release-3.6) (argoproj#14183)

Signed-off-by: Alan Clucas <alan@clucas.org>

* docs: clarify qps/burst on controller (cherry-pick argoproj#14190) (argoproj#14192)

Signed-off-by: Tim Collins <tim@thecollins.team>
Co-authored-by: Tim Collins <45351296+tico24@users.noreply.github.com>

* fix(api/jsonschema): use unchanging JSON Schema version (cherry-pick argoproj#14092) (argoproj#14256)

Signed-off-by: Roger Peppe <rogpeppe@gmail.com>
Co-authored-by: Roger Peppe <rogpeppe@gmail.com>

* fix(api/jsonschema): use working `$id` (cherry-pick argoproj#14257) (argoproj#14258)

Signed-off-by: Roger Peppe <rogpeppe@gmail.com>
Co-authored-by: Roger Peppe <rogpeppe@gmail.com>

* docs: autogenerate tested k8s versions and centralize config (argoproj#14176) (release-3.6) (argoproj#14262)

Signed-off-by: Mason Malone <651224+MasonM@users.noreply.github.com>
Signed-off-by: Alan Clucas <alan@clucas.org>
Co-authored-by: Mason Malone <651224+MasonM@users.noreply.github.com>

* chore(deps): bump minio-go to newer version (argoproj#14185) (release-3.6) (argoproj#14261)

Co-authored-by: Vaibhav Kaushik <vaibhavkaushik@salesforce.com>

* fix: split pod controller from workflow controller (argoproj#14129) (release-3.6) (argoproj#14263)

* chore(deps): fix snyk (argoproj#14264) (release-3.6) (argoproj#14268)

* chore: revert to correct k8s versions

Accidental bump from argoproj#14176 cherry-pick

Signed-off-by: Alan Clucas <alan@clucas.org>

* chore(deps): bump github.com/go-jose/go-jose/v3 from 3.0.3 to 3.0.4 in the go_modules group (cherry-pick argoproj#14231) (argoproj#14269)

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* fix: wait for workflow informer to sync before pod informer (cherry-pick argoproj#14248) (argoproj#14266)

Signed-off-by: Rohan K <rohankmr414@gmail.com>
Co-authored-by: Rohan K <rohankmr414@gmail.com>

* fix(cli): remove red from log colour selection. Fixes argoproj#6740 (cherry-pick argoproj#14215) (argoproj#14278)

Signed-off-by: Prabakaran Kumaresshan <4676330+nixphix@users.noreply.github.com>
Co-authored-by: Prabakaran Kumaresshan <4676330+nixphix@users.noreply.github.com>

* fix: correct semaphore configmap keys for multiple semaphores (argoproj#14184) (release-3.6) (argoproj#14281)

* fix: don't print help for non-validation errors. Fixes argoproj#14234 (cherry-pick argoproj#14249) (argoproj#14283)

Signed-off-by: Koichi Shimada <jumpe1programming@gmail.com>
Signed-off-by: Mason Malone <651224+MasonM@users.noreply.github.com>
Co-authored-by: koichi <51446844+jumpe1@users.noreply.github.com>
Co-authored-by: Mason Malone <651224+MasonM@users.noreply.github.com>

* docs: fix kubernetes versions (release-3.6) (argoproj#14273)

Signed-off-by: Alan Clucas <alan@clucas.org>

* fix(workflow/sync): use RWMutex to prevent concurrent map access (cherry-pick argoproj#14321) (argoproj#14322)

Signed-off-by: Ryan Currah <ryan@currah.ca>
Co-authored-by: Ryan Currah <ryan@currah.ca>

* chore(lint): update golangci-lint to 2.1.1 (argoproj#14390) (cherry-pick release-3.6) (argoproj#14417)

* chore: bump golang 1.23->1.24 (argoproj#14385) (cherry-pick release-3.6) (argoproj#14418)

* fix: gracefully handle invalid CronWorkflows and simplify logic.  (cherry-pick argoproj#14197) (argoproj#14419)

Signed-off-by: Mason Malone <651224+MasonM@users.noreply.github.com>

* fix: prevent dfs sorter infinite recursion on cycle. Fixes argoproj#13395 (cherry-pick argoproj#14391) (argoproj#14420)

Signed-off-by: Adrien Delannoy <a.delannoyfr@gmail.com>
Co-authored-by: Adrien Delannoy <a.delannoyfr@gmail.com>

* chore(deps): bump github.com/expr-lang/expr from 1.16.9 to 1.17.0 (argoproj#14307) (release-3.6) (argoproj#14421)

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* chore(deps)!: update k8s and argo-events (release-3.6) (argoproj#14424)

Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: william.vanhevelingen <william.vanhevelingen@acquia.com>
Signed-off-by: Mason Malone <651224+MasonM@users.noreply.github.com>
Signed-off-by: William Van Hevelingen <william.vanhevelingen@acquia.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: William Van Hevelingen <William.VanHevelingen@acquia.com>
Co-authored-by: Mason Malone <651224+MasonM@users.noreply.github.com>

* fix: correct retry logic (argoproj#13734) (release-3.6) (argoproj#14428)

Signed-off-by: isubasinghe <isitha@pipekit.io>
Signed-off-by: Alan Clucas <alan@clucas.org>
Co-authored-by: Isitha Subasinghe <isitha@pipekit.io>

* fix: manual retries exit handler cleanup. Fixes argoproj#14180 (argoproj#14181) (release-3.6) (argoproj#14429)

Signed-off-by: isubasinghe <isitha@pipekit.io>
Signed-off-by: Alan Clucas <alan@clucas.org>
Co-authored-by: Isitha Subasinghe <isitha@pipekit.io>

* fix: correct manual retry logic. Fixes argoproj#14124 (argoproj#14328) (release-3.6) (argoproj#14430)

Signed-off-by: oninowang <oninowang@tencent.com>
Signed-off-by: Alan Clucas <alan@clucas.org>
Co-authored-by: jswxstw <jswxstw@gmail.com>

* fix: disable ALPN in argo-server as a workaround (argoproj#14433)

Signed-off-by: Alan Clucas <alan@clucas.org>

* result of codegen

Signed-off-by: Kim <kim.aharfi@codefresh.io>

* fix:lint

Signed-off-by: Kim <kim.aharfi@codefresh.io>

---------

Signed-off-by: Alan Clucas <alan@clucas.org>
Signed-off-by: Tim Collins <tim@thecollins.team>
Signed-off-by: Roger Peppe <rogpeppe@gmail.com>
Signed-off-by: Mason Malone <651224+MasonM@users.noreply.github.com>
Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: Rohan K <rohankmr414@gmail.com>
Signed-off-by: Prabakaran Kumaresshan <4676330+nixphix@users.noreply.github.com>
Signed-off-by: Koichi Shimada <jumpe1programming@gmail.com>
Signed-off-by: Ryan Currah <ryan@currah.ca>
Signed-off-by: Adrien Delannoy <a.delannoyfr@gmail.com>
Signed-off-by: william.vanhevelingen <william.vanhevelingen@acquia.com>
Signed-off-by: William Van Hevelingen <william.vanhevelingen@acquia.com>
Signed-off-by: isubasinghe <isitha@pipekit.io>
Signed-off-by: oninowang <oninowang@tencent.com>
Signed-off-by: Kim <kim.aharfi@codefresh.io>
Co-authored-by: Alan Clucas <alan@clucas.org>
Co-authored-by: gcp-cherry-pick-bot[bot] <98988430+gcp-cherry-pick-bot[bot]@users.noreply.github.com>
Co-authored-by: Tim Collins <45351296+tico24@users.noreply.github.com>
Co-authored-by: Roger Peppe <rogpeppe@gmail.com>
Co-authored-by: Mason Malone <651224+MasonM@users.noreply.github.com>
Co-authored-by: Vaibhav Kaushik <vaibhavkaushik@salesforce.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Rohan K <rohankmr414@gmail.com>
Co-authored-by: Prabakaran Kumaresshan <4676330+nixphix@users.noreply.github.com>
Co-authored-by: koichi <51446844+jumpe1@users.noreply.github.com>
Co-authored-by: Ryan Currah <ryan@currah.ca>
Co-authored-by: Adrien Delannoy <a.delannoyfr@gmail.com>
Co-authored-by: William Van Hevelingen <William.VanHevelingen@acquia.com>
Co-authored-by: Isitha Subasinghe <isitha@pipekit.io>
Co-authored-by: jswxstw <jswxstw@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants