-
Notifications
You must be signed in to change notification settings - Fork 272
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Follow up for WFFC test suite adjustments #2633
Conversation
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: awels The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/hold testing the changes |
9d832c6
to
30d8720
Compare
e5edd1d
to
000285b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
/lgtm
000285b
to
02cdc10
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
02cdc10
to
2817f46
Compare
@@ -370,7 +370,6 @@ func (f *Framework) NewPodWithPVC(podName, cmd string, pvc *k8sv1.PersistentVolu | |||
Spec: k8sv1.PodSpec{ | |||
// this may be causing an issue | |||
TerminationGracePeriodSeconds: &[]int64{10}[0], | |||
RestartPolicy: k8sv1.RestartPolicyNever, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason to remove this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I am trying to find out why we have it in the first place, seems to date back 5+ years
basically means verifier won't attempt to restart on transient errs
I have some tests where the verifier pod regularly hits an error without much context,
and does not recover from it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to OnFailure so it only restarts if it exits with a return code other than 0
(default is Always - pod must always be running)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alromeros @awels I put this change in a separate commit, think it deserves it.
I wanted to have your thoughts on this one as it is not trivial
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it makes perfect sense to change it to RestartPolicyOnFailure
, but like you, I lack some context on why we had RestartPolicyNever
before. If it's not breaking anything I'm for the change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What transient errors are we thinking. we would like to know if the verifier pod errors (like maybe the file we are looking for doesn't exist). This would simply restart the failed command, which may or may not fail again depending on if the error is transient or a real issue. If we do this, we need to make sure there are some time limits on the commands we run in the verifier pod.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am hitting errors in container startup (not in executing commands on it)
Seems to be related to quick recreation of the deleted verifier pod, but not sure, error does not say much,
the container just fails to spin up and doesn't retry.
Is there a reason to go with RestartPolicyNever
? If you feel this is too offensive I will remove the second commit
and we can proceed with this PR without this change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can try with the on failure policy, we will know if something breaks due to it, there is an overall suite timeout anyway.
2817f46
to
3f076f3
Compare
Just a clarification, I am testing this change against an external cluster with LVM Storage |
Previous PR does not make our test suite completely WFFC-friendly, as it turns out more follow ups were needed. Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>
This dates back a long time ago (5yrs), it seems that we may want to restart on intermediate errs Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>
3f076f3
to
fc2eb5d
Compare
/hold cancel |
/test pull-containerized-data-importer-fossa |
/lgtm |
/test pull-containerized-data-importer-fossa |
/retest |
/cherrypick release-v1.56 |
@akalenyu: new pull request created: #2649 In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/cherrypick release-v1.55 |
@akalenyu: #2633 failed to apply on top of branch "release-v1.55":
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
* Follow up for WFFC test suite adjustments Previous PR does not make our test suite completely WFFC-friendly, as it turns out more follow ups were needed. Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com> * Change restartpolicy on testing pods to rerun on failure This dates back a long time ago (5yrs), it seems that we may want to restart on intermediate errs Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com> --------- Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>
* Follow up for WFFC test suite adjustments Previous PR does not make our test suite completely WFFC-friendly, as it turns out more follow ups were needed. * Change restartpolicy on testing pods to rerun on failure This dates back a long time ago (5yrs), it seems that we may want to restart on intermediate errs --------- Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>
What this PR does / why we need it:
Previous PR does not make our test suite completely WFFC-friendly,
as it turns out more follow-ups were needed.
This continues the work in #2610
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Fixes #
Special notes for your reviewer:
Release note: