Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: do not delete pvc when max parallelism has been reached. Fixes #11119 #11138

Merged
merged 1 commit into from
Jun 25, 2023
Merged

fix: do not delete pvc when max parallelism has been reached. Fixes #11119 #11138

merged 1 commit into from
Jun 25, 2023

Conversation

abrabah
Copy link
Contributor

@abrabah abrabah commented May 26, 2023

Fixes #11119

Modifications

Modified PVC cleanup logic so that it does not trigger on ErrParallelismReached

Verification

Wrote a test with a workflow using loops and two pvc's . Constrained parallelization by setting parallelism: 1 in the workflow.

switch err {
case ErrDeadlineExceeded:
woc.eventRecorder.Event(woc.wf, apiv1.EventTypeWarning, "WorkflowTimedOut", x.Error())
case ErrParallelismReached:
shouldDeletePVCs = false
Copy link
Member

@rohankmr414 rohankmr414 Jun 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of this true false logic can you just throw an error when the error is ErrParallelismReached?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand. The function does not currently return errors at all. Would probably be a bad idea to change the function signature

if shouldDeletePVCs {
if err := woc.deletePVCs(ctx); err != nil {
woc.log.WithError(err).Warn("failed to delete PVCs")
}
}
Copy link
Contributor

@juliev0 juliev0 Jun 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All this logic appears to be a duplicate of the above, right? If we can do something about that it would be great. I realize it was a duplicate before you touched it, but now it's getting longer so more in need of condensing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right 👍 I'll see what I can do..

if shouldDeletePVCs {
if err := woc.deletePVCs(ctx); err != nil {
woc.log.WithError(err).Warn("failed to delete PVCs")
}
}
Copy link
Contributor

@juliev0 juliev0 Jun 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to understand. Is the difference here that some errors indicate that the Workflow should enter a failed state and other errors are a temporary problem that might be resolved at the next reconciliation?

I see markWorkflowError() is not used in certain other cases, such as ErrDeadlineExceeded, transient error, etc. It seems to me that we shouldn't delete the PVCs in any cases where we're not considering the Workflow to be failed?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If that is the case, it would be great to use the concept of "temporary error" generically and then perform whatever logic should happen when there's a non-temporary error (at this time maybe it's only the deletion of PVCs but in the future could be more)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm having a hard time understanding the logic here myself to be honest. It might be that the PVC logic doesn't belong in that function at all..

I'm probably the wrong person to implement what you are proposing as I do not have a good overview of the code base..

Copy link
Contributor

@juliev0 juliev0 Jun 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it looks like an okay place to put the logic since this is where the PVCs are being deleted. As far as I can tell, of all of the errors there, it's only where we woc.markWorkflowError() that there's a real error. Everywhere else is a temporary issue and the Workflow is left in its current state, to be reconciled later. I feel we should only delete the PVCs if there's a serious error where we've deemed the Workflow to be in error.

@jiachengxu I see you changed this code originally. Do you agree?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@juliev0 I added the logic in multiple places in this PR: #10424 to fix the issue that PVC could not be GCed if Entrypoint template execution or Onexit template execution returns errors.
It was a bug reported here: #10408

Copy link
Contributor

@juliev0 juliev0 Jun 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jiachengxu do you agree that we should not delete the PVC in the case of certain transient/temporary errors? I am thinking we should only delete it if we're actually marking the Workflow as "Error" state.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@juliev0 Yeah, I agree that for temporary errors we should not delete PVCs since that error could be resolved during the next reconciliation, and we only delete PVCs if the Workflow is in "real" Error state.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jiachengxu thanks for confirming! @abrabah do you have enough information to move forward?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback, appreciate it 👍
I've pushed a commit where the PVC deletion logic has been moved to the default case in the switch statement

@juliev0
Copy link
Contributor

juliev0 commented Jun 23, 2023

Thanks. If you can get the CI to pass, I will merge it.

…11119

Signed-off-by: Abraham Bah <abraham.bah@gmail.com>
@juliev0 juliev0 merged commit aa2b66a into argoproj:master Jun 25, 2023
JPZ13 pushed a commit to pipekit/argo-workflows that referenced this pull request Jul 4, 2023
terrytangyuan pushed a commit that referenced this pull request Jul 19, 2023
…11119 (#11138)

Signed-off-by: Abraham Bah <abraham.bah@gmail.com>
dpadhiar pushed a commit to dpadhiar/argo-workflows that referenced this pull request May 9, 2024
…rgoproj#11119 (argoproj#11138)

Signed-off-by: Abraham Bah <abraham.bah@gmail.com>
Signed-off-by: Dillen Padhiar <dillen_padhiar@intuit.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants