-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
oadp-1.3: OADP-4265 Mark InProgress backup/restore as failed upon requeuing #315
oadp-1.3: OADP-4265 Mark InProgress backup/restore as failed upon requeuing #315
Conversation
Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com> remove uuid, return err to requeue instead of requeue: true Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>
from prior feedback need to do finalizer (operations in 1.3) controller also |
For restore operations controller we're covered by this returning error:
backup operations controller:
|
Yes, I think we're fine leaving the backup/restore_operations_controller returns alone, since those already requeue. I don't think we have restore finalizer in 1.3, so we just need backup, restore, and backup finalizer. |
If patch fails on finalizer controller we wanna retry patch as fail? It's kinda difficult there since it's currently using defer func. I would have to break the patch call out of defer to return reconciler err on patch fail. Is that ok with you? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of minor comments on error messages and the change of variable name. The bigger issue is we need the change for the backup finalizer controller as well (no restore finalizer controller in Velero 1.12, so that's not a concern here).
pkg/controller/backup_controller.go
Outdated
log.Debug("Backup has in progress status from prior reconcile, marking it as failed") | ||
failedCopy := original.DeepCopy() | ||
failedCopy.Status.Phase = velerov1api.BackupPhaseFailed | ||
failedCopy.Status.FailureReason = "Backup from previous reconcile still in progress" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may want to suggest an APIServer failure here.
"Backup from previous reconcile still in progress. The API Server may have been down."
pkg/controller/backup_controller.go
Outdated
@@ -249,7 +263,6 @@ func (b *backupReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctr | |||
request.Status.Phase = velerov1api.BackupPhaseInProgress | |||
request.Status.StartTimestamp = &metav1.Time{Time: b.clock.Now()} | |||
} | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To minimize change from upstream since this is in our fork, lets not include whitespace changes like this.
pkg/controller/restore_controller.go
Outdated
log.Debug("Restore has in progress status from prior reconcile, marking it as failed") | ||
failedCopy := original.DeepCopy() | ||
failedCopy.Status.Phase = api.RestorePhaseFailed | ||
failedCopy.Status.FailureReason = "Restore from previous reconcile still in progress" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may want to suggest an APIServer failure here.
"Restore from previous reconcile still in progress. The API Server may have been down."
pkg/controller/restore_controller.go
Outdated
@@ -162,8 +162,8 @@ func (r *restoreReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ct | |||
// the controller. | |||
log := r.logger.WithField("Restore", req.NamespacedName.String()) | |||
|
|||
restore := &api.Restore{} | |||
err := r.kbClient.Get(ctx, client.ObjectKey{Namespace: req.Namespace, Name: req.Name}, restore) | |||
original := &api.Restore{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the variable name change necessary here? This makes the diff larger and increases the possibility of rebase conflicts, since we're carrying this commit in our fork.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did this so restore_controller and backup_controller has the same var name pattern that's all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not necessary I agree, just make the logic more pastable across both.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kaovilai I figured that's why you did that. I reverted that part here in a later commit since we're carrying the commit in our fork right now, and it removed a lot of lines from the diff, making rebase conflicts less likely. Does that seem reasonable to you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes. That's reasonable
Hmm. I think we need to do this in some way. If we can't do it with defer, then we'll need to eliminate the defer call and include this with all return statements. |
Signed-off-by: Scott Seago <sseago@redhat.com>
Signed-off-by: Scott Seago <sseago@redhat.com>
Unlike the InProgress transition, there's no need to fail here, since the Finalize steps can be repeated.
Signed-off-by: Scott Seago <sseago@redhat.com>
lgtm |
ok.. testing update: w/o the patch, the backup stayed in progress. While updating the dpa the velero server was restarted. backup: westest-vsphere-apidown-1
1.3.0 have to change the csv to get the test image on:
Once patched, I initiated a second backup and then took down the api server for roughly 1 minute: backup: westest-vsphere-apidown-2
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewing to remove my prior "changes requested", but don't count it as an ack, since my own changes are in here as well.
This comment was marked as resolved.
This comment was marked as resolved.
Just noting that this may never upstream based on comments at vmware-tanzu#7863 (comment) |
New changes are detected. LGTM label has been removed. |
07bab34
to
dd74100
Compare
dd74100
to
a51ef63
Compare
retest after #320 |
/retest |
@kaovilai: all tests passed! Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
[APPROVALNOTIFIER] This PR is APPROVED Approval requirements bypassed by manually added approval. This pull-request has been approved by: kaovilai, sseago, weshayutin The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
follow on bugfix: #324 |
…330) * oadp-1.4: OADP-3227: Mark InProgress backup/restore as failed upon requeuing (#315) * Mark InProgress backup/restore as failed upon requeuing Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com> remove uuid, return err to requeue instead of requeue: true Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com> * cleanup to minimize diff from upstream Signed-off-by: Scott Seago <sseago@redhat.com> * error message update Signed-off-by: Scott Seago <sseago@redhat.com> * requeue on finalize status update. Unlike the InProgress transition, there's no need to fail here, since the Finalize steps can be repeated. * Only run patch once for all backup finalizer return scenarios Signed-off-by: Scott Seago <sseago@redhat.com> --------- Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com> Signed-off-by: Scott Seago <sseago@redhat.com> Co-authored-by: Scott Seago <sseago@redhat.com> * oadp-1.4: OADP-3227: Reconcile To Fail: Add backup/restore trackers (#324) * OADP-4265: Reconcile To Fail: Add backup/restore trackers Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com> * Apply suggestions from code review: backupTracker * Address restoreTracker feedback Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com> * s/delete from/add to/ in the comment * unit test fix Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com> * backup_controller unit test Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com> * restore_controller unit test Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com> * `make update` Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com> * mock patch to fail failure due to connection refused Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com> --------- Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com> * regenerate mocks Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com> --------- Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com> Signed-off-by: Scott Seago <sseago@redhat.com> Co-authored-by: Scott Seago <sseago@redhat.com>
…penshift#330) * oadp-1.4: OADP-3227: Mark InProgress backup/restore as failed upon requeuing (openshift#315) * Mark InProgress backup/restore as failed upon requeuing Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com> remove uuid, return err to requeue instead of requeue: true Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com> * cleanup to minimize diff from upstream Signed-off-by: Scott Seago <sseago@redhat.com> * error message update Signed-off-by: Scott Seago <sseago@redhat.com> * requeue on finalize status update. Unlike the InProgress transition, there's no need to fail here, since the Finalize steps can be repeated. * Only run patch once for all backup finalizer return scenarios Signed-off-by: Scott Seago <sseago@redhat.com> --------- Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com> Signed-off-by: Scott Seago <sseago@redhat.com> Co-authored-by: Scott Seago <sseago@redhat.com> * oadp-1.4: OADP-3227: Reconcile To Fail: Add backup/restore trackers (openshift#324) * OADP-4265: Reconcile To Fail: Add backup/restore trackers Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com> * Apply suggestions from code review: backupTracker * Address restoreTracker feedback Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com> * s/delete from/add to/ in the comment * unit test fix Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com> * backup_controller unit test Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com> * restore_controller unit test Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com> * `make update` Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com> * mock patch to fail failure due to connection refused Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com> --------- Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com> * regenerate mocks Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com> --------- Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com> Signed-off-by: Scott Seago <sseago@redhat.com> Co-authored-by: Scott Seago <sseago@redhat.com>
…penshift#330) * oadp-1.4: OADP-3227: Mark InProgress backup/restore as failed upon requeuing (openshift#315) * Mark InProgress backup/restore as failed upon requeuing Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com> remove uuid, return err to requeue instead of requeue: true Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com> * cleanup to minimize diff from upstream Signed-off-by: Scott Seago <sseago@redhat.com> * error message update Signed-off-by: Scott Seago <sseago@redhat.com> * requeue on finalize status update. Unlike the InProgress transition, there's no need to fail here, since the Finalize steps can be repeated. * Only run patch once for all backup finalizer return scenarios Signed-off-by: Scott Seago <sseago@redhat.com> --------- Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com> Signed-off-by: Scott Seago <sseago@redhat.com> Co-authored-by: Scott Seago <sseago@redhat.com> * oadp-1.4: OADP-3227: Reconcile To Fail: Add backup/restore trackers (openshift#324) * OADP-4265: Reconcile To Fail: Add backup/restore trackers Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com> * Apply suggestions from code review: backupTracker * Address restoreTracker feedback Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com> * s/delete from/add to/ in the comment * unit test fix Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com> * backup_controller unit test Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com> * restore_controller unit test Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com> * `make update` Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com> * mock patch to fail failure due to connection refused Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com> --------- Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com> * regenerate mocks Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com> --------- Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com> Signed-off-by: Scott Seago <sseago@redhat.com> Co-authored-by: Scott Seago <sseago@redhat.com>
Signed-off-by: Tiger Kaovilai tkaovila@redhat.com
Thank you for contributing to Velero!
Please add a summary of your change
Does your change fix a particular issue?
Fixes #(issue)
Please indicate you've done the following:
/kind changelog-not-required
as a comment on this pull request.site/content/docs/main
.