fix(backup): set backup error if backup target invalid #2086

mantissahz · 2023-07-17T12:39:36Z

When starting a backup monitor, it will start to take a snapshot backup procedure and it will be failed if the s3 backup target is invalid (nfs backup target will be not be failed because of timeout and test the backup when it is in progress.)
Set the backup state as Error if starting the backup procedure returns an error.

Ref: longhorn/longhorn#1249

ejweber · 2023-07-21T21:36:14Z

Related to (and possibly fixes) longhorn/longhorn#6358.

cc @PhanLe1010

PhanLe1010 · 2023-07-21T21:39:14Z

Yeah, this is one way to fix it. I am fine with either this approach or longhorn/longhorn#6358 (comment).

Leaving it to @mantissahz and @ejweber to make the call :D

ejweber

I think if we do this, the backup can never complete, even if the backup target becomes valid later.

longhorn-manager/controller/backup_controller.go

Lines 347 to 354 in ce99ddf

    
           // Perform backup snapshot to the remote backup target 
        
           // If the Backup CR is created by the user/API layer (spec.snapshotName != "") and has not been synced (status.lastSyncedAt == ""), 
        
           // it means creating a backup from a volume snapshot is required. 
        
           // Hence the source of truth is the engine/replica and the controller needs to sync the status with it. 
        
           // Otherwise, the Backup CR is created by the backup volume controller, which means the backup already 
        
           // exists in the remote backup target before the CR creation. 
        
           // What the controller needs to do for this case is retrieve the info from the remote backup target. 
        
           if backup.Status.LastSyncedAt.IsZero() && backup.Spec.SnapshotName != "" {

That was fine for me in #6358 because the snapshot didn't exist and the situation was not recoverable.

Do we need to distinguish between recoverable and unrecoverable situations?

PhanLe1010 · 2023-07-21T21:55:06Z

I think if we do this, the backup can never complete, even if the backup target becomes valid later.

It is fine to me

PhanLe1010 · 2023-07-21T22:02:04Z

I am taking it back. May be this backup target not setup error should be recoverable and be retried

mantissahz · 2023-07-24T01:23:49Z

@PhanLe1010 Do you mean that we should have some retries for the backup when encountering a backup target is invalid to check if the backup target is recoverable?

shuo-wu

The PR itself LGTM (without considering longhorn/longhorn#6358 (comment)).

Without this fix, the backup CR can be never handled. The reconcile loop will directly error out rather than updating the CR state to Error.

When starting a backup monitor, it will start to take a snapshot backup procedure and it will be failed if the s3 backup target is invalid (nfs backup target will be not be failed because of timeout and test the backup when it is in progress.) Set the backup state as `Error` if starting the backup procedure returns an error. Ref: 1249 Signed-off-by: James Lu <james.lu@suse.com>

ejweber

Approving this and will test it as the fix to longhorn/longhorn#6358 when it merges.

I am still thinking it would be better to have the ability to retry if the backup target becomes valid. But to @shuo-wu's point, without making this change, we can never know a backup is in error and we will keep silently trying over and over to do an impossible thing (this is the underlying complaint in longhorn/longhorn#6358). Maybe in the future we can consider an addition to backup monitor reconcile loop that can recognize if a backup's error is related to an invalid backup target and retrying if the backup target is now valid.

innobead · 2023-07-27T15:37:05Z

@mergify backport v1.5.x v1.4.x v1.3.x

mergify · 2023-07-27T15:37:46Z

backport v1.5.x v1.4.x v1.3.x

✅ Backports have been created

#2090 fix(backup): set backup error if backup target invalid (backport #2086) has been created for branch v1.5.x
#2091 fix(backup): set backup error if backup target invalid (backport #2086) has been created for branch v1.4.x
#2092 fix(backup): set backup error if backup target invalid (backport #2086) has been created for branch v1.3.x

innobead · 2023-07-27T15:50:44Z

Approving this and will test it as the fix to longhorn/longhorn#6358 when it merges.

I am still thinking it would be better to have the ability to retry if the backup target becomes valid. But to @shuo-wu's point, without making this change, we can never know a backup is in error and we will keep silently trying over and over to do an impossible thing (this is the underlying complaint in longhorn/longhorn#6358). Maybe in the future we can consider an addition to backup monitor reconcile loop that can recognize if a backup's error is related to an invalid backup target and retrying if the backup target is now valid.

Agreed with this point. @ejweber please create a ticket for this. This should be part of how resilient the system/function we want to make. Any operation should be considered with retry, backoff, etc if suitable/could. Surely, the audit should be clear during the resilience period.

mantissahz requested a review from a team as a code owner July 17, 2023 12:39

mantissahz self-assigned this Jul 17, 2023

mantissahz mentioned this pull request Jul 17, 2023

[BUG] Error during backup process will be removed quickly without user knowing longhorn/longhorn#1249

Closed

ejweber reviewed Jul 21, 2023

View reviewed changes

shuo-wu approved these changes Jul 25, 2023

View reviewed changes

mantissahz force-pushed the issue1249 branch from d7d82fa to 0b6dfe0 Compare July 25, 2023 11:42

ejweber approved these changes Jul 25, 2023

View reviewed changes

innobead self-requested a review July 27, 2023 15:34

innobead approved these changes Jul 27, 2023

View reviewed changes

innobead merged commit e278ba0 into longhorn:master Jul 27, 2023
4 checks passed

This was referenced Jul 27, 2023

fix(backup): set backup error if backup target invalid (backport #2086) #2090

Merged

fix(backup): set backup error if backup target invalid (backport #2086) #2091

Merged

fix(backup): set backup error if backup target invalid (backport #2086) #2092

Merged

This was referenced Aug 1, 2023

[BUG] Failure to update backup status leads to infinite reconciliation longhorn/longhorn#6358

Closed

[BUG][1.5.0] Longhorn Manager High Memory Consumption longhorn/longhorn#6315

Closed

ejweber mentioned this pull request Aug 11, 2023

[BUG] Longhorn Instance Manager Memory leak longhorn/longhorn#6481

Closed

WebberHuang1118 mentioned this pull request Dec 25, 2023

[BUG] Longhorn Instance Manager Memory leak harvester/harvester#4431

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(backup): set backup error if backup target invalid #2086

fix(backup): set backup error if backup target invalid #2086

mantissahz commented Jul 17, 2023

ejweber commented Jul 21, 2023 •

edited

Loading

PhanLe1010 commented Jul 21, 2023 •

edited

Loading

ejweber left a comment

PhanLe1010 commented Jul 21, 2023

PhanLe1010 commented Jul 21, 2023 •

edited

Loading

mantissahz commented Jul 24, 2023

shuo-wu left a comment

ejweber left a comment •

edited

Loading

innobead commented Jul 27, 2023

mergify bot commented Jul 27, 2023 •

edited

Loading

innobead commented Jul 27, 2023

	// Perform backup snapshot to the remote backup target
	// If the Backup CR is created by the user/API layer (spec.snapshotName != "") and has not been synced (status.lastSyncedAt == ""),
	// it means creating a backup from a volume snapshot is required.
	// Hence the source of truth is the engine/replica and the controller needs to sync the status with it.
	// Otherwise, the Backup CR is created by the backup volume controller, which means the backup already
	// exists in the remote backup target before the CR creation.
	// What the controller needs to do for this case is retrieve the info from the remote backup target.
	if backup.Status.LastSyncedAt.IsZero() && backup.Spec.SnapshotName != "" {

fix(backup): set backup error if backup target invalid #2086

fix(backup): set backup error if backup target invalid #2086

Conversation

mantissahz commented Jul 17, 2023

ejweber commented Jul 21, 2023 • edited Loading

PhanLe1010 commented Jul 21, 2023 • edited Loading

ejweber left a comment

Choose a reason for hiding this comment

PhanLe1010 commented Jul 21, 2023

PhanLe1010 commented Jul 21, 2023 • edited Loading

mantissahz commented Jul 24, 2023

shuo-wu left a comment

Choose a reason for hiding this comment

ejweber left a comment • edited Loading

Choose a reason for hiding this comment

innobead commented Jul 27, 2023

mergify bot commented Jul 27, 2023 • edited Loading

✅ Backports have been created

innobead commented Jul 27, 2023

ejweber commented Jul 21, 2023 •

edited

Loading

PhanLe1010 commented Jul 21, 2023 •

edited

Loading

PhanLe1010 commented Jul 21, 2023 •

edited

Loading

ejweber left a comment •

edited

Loading

mergify bot commented Jul 27, 2023 •

edited

Loading