-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: do not delete pvc when max parallelism has been reached. Fixes #11119 #11138
fix: do not delete pvc when max parallelism has been reached. Fixes #11119 #11138
Conversation
workflow/controller/operator.go
Outdated
switch err { | ||
case ErrDeadlineExceeded: | ||
woc.eventRecorder.Event(woc.wf, apiv1.EventTypeWarning, "WorkflowTimedOut", x.Error()) | ||
case ErrParallelismReached: | ||
shouldDeletePVCs = false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of this true
false
logic can you just throw an error when the error is ErrParallelismReached
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I understand. The function does not currently return errors at all. Would probably be a bad idea to change the function signature
if shouldDeletePVCs { | ||
if err := woc.deletePVCs(ctx); err != nil { | ||
woc.log.WithError(err).Warn("failed to delete PVCs") | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All this logic appears to be a duplicate of the above, right? If we can do something about that it would be great. I realize it was a duplicate before you touched it, but now it's getting longer so more in need of condensing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right 👍 I'll see what I can do..
if shouldDeletePVCs { | ||
if err := woc.deletePVCs(ctx); err != nil { | ||
woc.log.WithError(err).Warn("failed to delete PVCs") | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm trying to understand. Is the difference here that some errors indicate that the Workflow should enter a failed state and other errors are a temporary problem that might be resolved at the next reconciliation?
I see markWorkflowError()
is not used in certain other cases, such as ErrDeadlineExceeded
, transient error, etc. It seems to me that we shouldn't delete the PVCs in any cases where we're not considering the Workflow to be failed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If that is the case, it would be great to use the concept of "temporary error" generically and then perform whatever logic should happen when there's a non-temporary error (at this time maybe it's only the deletion of PVCs but in the future could be more)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm having a hard time understanding the logic here myself to be honest. It might be that the PVC logic doesn't belong in that function at all..
I'm probably the wrong person to implement what you are proposing as I do not have a good overview of the code base..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it looks like an okay place to put the logic since this is where the PVCs are being deleted. As far as I can tell, of all of the errors there, it's only where we woc.markWorkflowError()
that there's a real error. Everywhere else is a temporary issue and the Workflow is left in its current state, to be reconciled later. I feel we should only delete the PVCs if there's a serious error where we've deemed the Workflow to be in error.
@jiachengxu I see you changed this code originally. Do you agree?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jiachengxu do you agree that we should not delete the PVC in the case of certain transient/temporary errors? I am thinking we should only delete it if we're actually marking the Workflow as "Error" state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@juliev0 Yeah, I agree that for temporary errors we should not delete PVCs since that error could be resolved during the next reconciliation, and we only delete PVCs if the Workflow is in "real" Error state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jiachengxu thanks for confirming! @abrabah do you have enough information to move forward?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the feedback, appreciate it 👍
I've pushed a commit where the PVC deletion logic has been moved to the default case in the switch statement
Thanks. If you can get the CI to pass, I will merge it. |
…11119 Signed-off-by: Abraham Bah <abraham.bah@gmail.com>
…rgoproj#11119 (argoproj#11138) Signed-off-by: Abraham Bah <abraham.bah@gmail.com>
…rgoproj#11119 (argoproj#11138) Signed-off-by: Abraham Bah <abraham.bah@gmail.com> Signed-off-by: Dillen Padhiar <dillen_padhiar@intuit.com>
Fixes #11119
Modifications
Modified PVC cleanup logic so that it does not trigger on
ErrParallelismReached
Verification
Wrote a test with a workflow using loops and two pvc's . Constrained parallelization by setting
parallelism: 1
in the workflow.