Skip to content

Conversation

@rohankmr414
Copy link
Member

Motivation

There's a bug in this PR: #14129

The bug is that the podInformer is running and active (

podController.podInformer.AddEventHandler(
) before the workflowInformer has completed cache synchronisation.
This means a pod can be orphaned (
case c.podOrphaned(pod):
) when it isn't really, because the workflowInformer hasn't yet seen the workflow that the orphan pod check should check.

We must delay starting the informer until after the workflow informer has started

Modifications

  • Changed the order to check for cache sync status of wfInformer before podInformer while running the pod controller

@rohankmr414 rohankmr414 requested a review from Joibel March 4, 2025 14:46
Signed-off-by: Rohan K <rohankmr414@gmail.com>
@rohankmr414 rohankmr414 force-pushed the pod-informer-startup branch from a3d3945 to dbe0c00 Compare March 4, 2025 14:50
// Run runs the pod controller
func (c *Controller) Run(ctx context.Context, workers int) {
defer c.workqueue.ShutDown()
go c.podInformer.Run(ctx.Done())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The podInformer.Run() call will start the pod informer, which can still make incorrect decisions based on the unsynced wfInformer. I think this Run() needs to move after the waitforcachesync on wfInfomer.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we move podInformer.Run() below the waitforcachesync, will podInformer.HasSynced return true?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we'll have to call WaitForCacheSync twice, once for wfInformer, then podInformer.Run, then wait on the podInformer in a second call.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resolved in 6b0aaf3

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

Signed-off-by: Rohan K <rohankmr414@gmail.com>
@rohankmr414 rohankmr414 requested a review from Joibel March 6, 2025 11:29
@Joibel Joibel enabled auto-merge (squash) March 6, 2025 12:05
@Joibel Joibel disabled auto-merge March 6, 2025 12:05
@Joibel Joibel enabled auto-merge (squash) March 6, 2025 12:05
@Joibel Joibel merged commit 68fde4f into argoproj:main Mar 6, 2025
31 checks passed
@Joibel
Copy link
Member

Joibel commented Mar 6, 2025

/cherry-pick release-3.6

gcp-cherry-pick-bot bot pushed a commit that referenced this pull request Mar 6, 2025
Signed-off-by: Rohan K <rohankmr414@gmail.com>
Joibel pushed a commit that referenced this pull request Mar 7, 2025
…ick #14248) (#14266)

Signed-off-by: Rohan K <rohankmr414@gmail.com>
Co-authored-by: Rohan K <rohankmr414@gmail.com>
kim-codefresh added a commit to codefresh-io/argo-workflows that referenced this pull request May 20, 2025
…abilities fixes (Cr 28355) (#358)

* fix: bump deps for k8schain to fix ecr-login (argoproj#14008) (release-3.6 cherry-pick) (argoproj#14174)

* fix(ci): python sdk release process (release-3.6) (argoproj#14183)

Signed-off-by: Alan Clucas <alan@clucas.org>

* docs: clarify qps/burst on controller (cherry-pick argoproj#14190) (argoproj#14192)

Signed-off-by: Tim Collins <tim@thecollins.team>
Co-authored-by: Tim Collins <45351296+tico24@users.noreply.github.com>

* fix(api/jsonschema): use unchanging JSON Schema version (cherry-pick argoproj#14092) (argoproj#14256)

Signed-off-by: Roger Peppe <rogpeppe@gmail.com>
Co-authored-by: Roger Peppe <rogpeppe@gmail.com>

* fix(api/jsonschema): use working `$id` (cherry-pick argoproj#14257) (argoproj#14258)

Signed-off-by: Roger Peppe <rogpeppe@gmail.com>
Co-authored-by: Roger Peppe <rogpeppe@gmail.com>

* docs: autogenerate tested k8s versions and centralize config (argoproj#14176) (release-3.6) (argoproj#14262)

Signed-off-by: Mason Malone <651224+MasonM@users.noreply.github.com>
Signed-off-by: Alan Clucas <alan@clucas.org>
Co-authored-by: Mason Malone <651224+MasonM@users.noreply.github.com>

* chore(deps): bump minio-go to newer version (argoproj#14185) (release-3.6) (argoproj#14261)

Co-authored-by: Vaibhav Kaushik <vaibhavkaushik@salesforce.com>

* fix: split pod controller from workflow controller (argoproj#14129) (release-3.6) (argoproj#14263)

* chore(deps): fix snyk (argoproj#14264) (release-3.6) (argoproj#14268)

* chore: revert to correct k8s versions

Accidental bump from argoproj#14176 cherry-pick

Signed-off-by: Alan Clucas <alan@clucas.org>

* chore(deps): bump github.com/go-jose/go-jose/v3 from 3.0.3 to 3.0.4 in the go_modules group (cherry-pick argoproj#14231) (argoproj#14269)

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* fix: wait for workflow informer to sync before pod informer (cherry-pick argoproj#14248) (argoproj#14266)

Signed-off-by: Rohan K <rohankmr414@gmail.com>
Co-authored-by: Rohan K <rohankmr414@gmail.com>

* fix(cli): remove red from log colour selection. Fixes argoproj#6740 (cherry-pick argoproj#14215) (argoproj#14278)

Signed-off-by: Prabakaran Kumaresshan <4676330+nixphix@users.noreply.github.com>
Co-authored-by: Prabakaran Kumaresshan <4676330+nixphix@users.noreply.github.com>

* fix: correct semaphore configmap keys for multiple semaphores (argoproj#14184) (release-3.6) (argoproj#14281)

* fix: don't print help for non-validation errors. Fixes argoproj#14234 (cherry-pick argoproj#14249) (argoproj#14283)

Signed-off-by: Koichi Shimada <jumpe1programming@gmail.com>
Signed-off-by: Mason Malone <651224+MasonM@users.noreply.github.com>
Co-authored-by: koichi <51446844+jumpe1@users.noreply.github.com>
Co-authored-by: Mason Malone <651224+MasonM@users.noreply.github.com>

* docs: fix kubernetes versions (release-3.6) (argoproj#14273)

Signed-off-by: Alan Clucas <alan@clucas.org>

* fix(workflow/sync): use RWMutex to prevent concurrent map access (cherry-pick argoproj#14321) (argoproj#14322)

Signed-off-by: Ryan Currah <ryan@currah.ca>
Co-authored-by: Ryan Currah <ryan@currah.ca>

* chore(lint): update golangci-lint to 2.1.1 (argoproj#14390) (cherry-pick release-3.6) (argoproj#14417)

* chore: bump golang 1.23->1.24 (argoproj#14385) (cherry-pick release-3.6) (argoproj#14418)

* fix: gracefully handle invalid CronWorkflows and simplify logic.  (cherry-pick argoproj#14197) (argoproj#14419)

Signed-off-by: Mason Malone <651224+MasonM@users.noreply.github.com>

* fix: prevent dfs sorter infinite recursion on cycle. Fixes argoproj#13395 (cherry-pick argoproj#14391) (argoproj#14420)

Signed-off-by: Adrien Delannoy <a.delannoyfr@gmail.com>
Co-authored-by: Adrien Delannoy <a.delannoyfr@gmail.com>

* chore(deps): bump github.com/expr-lang/expr from 1.16.9 to 1.17.0 (argoproj#14307) (release-3.6) (argoproj#14421)

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* chore(deps)!: update k8s and argo-events (release-3.6) (argoproj#14424)

Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: william.vanhevelingen <william.vanhevelingen@acquia.com>
Signed-off-by: Mason Malone <651224+MasonM@users.noreply.github.com>
Signed-off-by: William Van Hevelingen <william.vanhevelingen@acquia.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: William Van Hevelingen <William.VanHevelingen@acquia.com>
Co-authored-by: Mason Malone <651224+MasonM@users.noreply.github.com>

* fix: correct retry logic (argoproj#13734) (release-3.6) (argoproj#14428)

Signed-off-by: isubasinghe <isitha@pipekit.io>
Signed-off-by: Alan Clucas <alan@clucas.org>
Co-authored-by: Isitha Subasinghe <isitha@pipekit.io>

* fix: manual retries exit handler cleanup. Fixes argoproj#14180 (argoproj#14181) (release-3.6) (argoproj#14429)

Signed-off-by: isubasinghe <isitha@pipekit.io>
Signed-off-by: Alan Clucas <alan@clucas.org>
Co-authored-by: Isitha Subasinghe <isitha@pipekit.io>

* fix: correct manual retry logic. Fixes argoproj#14124 (argoproj#14328) (release-3.6) (argoproj#14430)

Signed-off-by: oninowang <oninowang@tencent.com>
Signed-off-by: Alan Clucas <alan@clucas.org>
Co-authored-by: jswxstw <jswxstw@gmail.com>

* fix: disable ALPN in argo-server as a workaround (argoproj#14433)

Signed-off-by: Alan Clucas <alan@clucas.org>

* result of codegen

Signed-off-by: Kim <kim.aharfi@codefresh.io>

* fix:lint

Signed-off-by: Kim <kim.aharfi@codefresh.io>

---------

Signed-off-by: Alan Clucas <alan@clucas.org>
Signed-off-by: Tim Collins <tim@thecollins.team>
Signed-off-by: Roger Peppe <rogpeppe@gmail.com>
Signed-off-by: Mason Malone <651224+MasonM@users.noreply.github.com>
Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: Rohan K <rohankmr414@gmail.com>
Signed-off-by: Prabakaran Kumaresshan <4676330+nixphix@users.noreply.github.com>
Signed-off-by: Koichi Shimada <jumpe1programming@gmail.com>
Signed-off-by: Ryan Currah <ryan@currah.ca>
Signed-off-by: Adrien Delannoy <a.delannoyfr@gmail.com>
Signed-off-by: william.vanhevelingen <william.vanhevelingen@acquia.com>
Signed-off-by: William Van Hevelingen <william.vanhevelingen@acquia.com>
Signed-off-by: isubasinghe <isitha@pipekit.io>
Signed-off-by: oninowang <oninowang@tencent.com>
Signed-off-by: Kim <kim.aharfi@codefresh.io>
Co-authored-by: Alan Clucas <alan@clucas.org>
Co-authored-by: gcp-cherry-pick-bot[bot] <98988430+gcp-cherry-pick-bot[bot]@users.noreply.github.com>
Co-authored-by: Tim Collins <45351296+tico24@users.noreply.github.com>
Co-authored-by: Roger Peppe <rogpeppe@gmail.com>
Co-authored-by: Mason Malone <651224+MasonM@users.noreply.github.com>
Co-authored-by: Vaibhav Kaushik <vaibhavkaushik@salesforce.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Rohan K <rohankmr414@gmail.com>
Co-authored-by: Prabakaran Kumaresshan <4676330+nixphix@users.noreply.github.com>
Co-authored-by: koichi <51446844+jumpe1@users.noreply.github.com>
Co-authored-by: Ryan Currah <ryan@currah.ca>
Co-authored-by: Adrien Delannoy <a.delannoyfr@gmail.com>
Co-authored-by: William Van Hevelingen <William.VanHevelingen@acquia.com>
Co-authored-by: Isitha Subasinghe <isitha@pipekit.io>
Co-authored-by: jswxstw <jswxstw@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants