Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(backup): delete backup in the backupstore asynchronously #3038

Conversation

ChanYiLin
Copy link
Contributor

@ChanYiLin ChanYiLin commented Aug 1, 2024

ref: longhorn/longhorn#8746

follow LEP to implement: longhorn/longhorn#9152

  • Make backup deletion asynchronous
  • Add Deleting state to Backup
    • Add deleting in-memory map to allow go routine to pass the command failure to the controller.
    • Add backoff to delay deletion command execution.
    • When the Backup is being deleted it should follow the following diagram
  • Disallow backup creation when there is any backup being deleted.
# Normal Case
Completed => Deleting 
=> finalizer removed (CR isgone)

# Command failure Case
Completed => Deleting 
=> Error (found the error message in the map) 
=> Deleting (Retry the command) => finalizer removed (CR isgone)

# Controller crashes
Completed => Deleting 
=> Error (failed to find the record in the map)
=> Deleting (Retry the command) => finalizer removed (CR isgone)

@ChanYiLin ChanYiLin self-assigned this Aug 1, 2024
@ChanYiLin ChanYiLin force-pushed the LH8746_make_backup_deletion_async_and_add_deleting_state branch 2 times, most recently from c809fd8 to 707f44c Compare August 1, 2024 14:04
@ChanYiLin
Copy link
Contributor Author

ChanYiLin commented Aug 1, 2024

Test Plan

Normal Case

  1. Create a Volume
  2. Write small data and then create a BackupA
  3. Write large data (~2G) and then create a BackupB
  4. Write small data and then create a Snapshot
  5. Delete the BackupB(large data), at the same time, create a BackupC from the Snapshot(you can click from UI)
  6. BackupC will be in Pending state with message (waiting for backupB to be deleted)
  7. After BackupB is deleted, BackupC should be in progress.

Error Case (use nfs)

  1. Create a Volume
  2. Write some data and then create a BackupA
  3. Write some data and then create a sSnapshot
  4. Exec into the backupstore pod and make the backup.cfg immutable
    $ chattr +i backups/backup_backup-5640dfd33a054f98.cfg`
    
  5. Delete the BackupA, at the same time, create a BackupB from the Snapshot(you can click from UI)
  6. BackupA will be in Deleting and Error state repeatedly to retry the deletion. When in Error state, it shows error message related to permission
  7. BackupB will be InProgress when BackupA is in Deleting state. BackupB should be complete after awhile.
  8. Remove the immutable, after a while, the BackupA should be in Deleting again and should be deleted successfully.

Controller Crashes Case (use nfs)

  1. Create a Volume
  2. Write some data and then create a BackupA
  3. Exec into the backupstore pod and make the backup.cfg immutable, example
    $ chattr +i backups/backup_backup-5640dfd33a054f98.cfg`
    
  4. Delete the BackupA
  5. BackupA will be in Deleting and Error state repeatedly to retry the deletion. When in Error state, it shows error message related to permission
  6. When the BackupA is in Deleting state, delete the longhorn manager pod directly. (you can find the one doing the deleting with Backup.Status.OwnerID)
  7. After the longhorn manager pod is recreated, the BackupA should turn into Error state with message No deletion in progress record, retry the deletion command
  8. Then after a while the BackupA should be in Deleting again and should be deleted successfully after remove the immutable.

@ChanYiLin ChanYiLin force-pushed the LH8746_make_backup_deletion_async_and_add_deleting_state branch from 707f44c to 4e74975 Compare August 1, 2024 14:12
@ChanYiLin ChanYiLin marked this pull request as draft August 2, 2024 09:53
@ChanYiLin
Copy link
Contributor Author

There is a regression
https://ci.longhorn.io/job/private/job/longhorn-tests-regression/7389/console
test_backup_status_for_unavailable_replicas
I know the root cause of it, fixing...

@ChanYiLin ChanYiLin force-pushed the LH8746_make_backup_deletion_async_and_add_deleting_state branch from 4e74975 to 5e21ded Compare August 5, 2024 10:00
@ChanYiLin
Copy link
Contributor Author

There is a regression https://ci.longhorn.io/job/private/job/longhorn-tests-regression/7389/console test_backup_status_for_unavailable_replicas I know the root cause of it, fixing...

Fixed.
https://ci.longhorn.io/job/private/job/longhorn-tests-regression/7392/

@ChanYiLin ChanYiLin marked this pull request as ready for review August 5, 2024 10:49
@ChanYiLin ChanYiLin force-pushed the LH8746_make_backup_deletion_async_and_add_deleting_state branch from 5e21ded to cf4419c Compare August 9, 2024 07:52
@ChanYiLin
Copy link
Contributor Author

change the backoff time to
min: 1Min -> max: 24hour(1 time/day)

@innobead innobead requested a review from shuo-wu August 11, 2024 16:14
@derekbit
Copy link
Member

derekbit commented Sep 2, 2024

@ChanYiLin Is it ready for review?

@ChanYiLin
Copy link
Contributor Author

Yes, it is ready for review.

@ChanYiLin
Copy link
Contributor Author

Hi @derekbit
This is ready for review, thanks

controller/backup_controller.go Outdated Show resolved Hide resolved
controller/backup_controller.go Outdated Show resolved Hide resolved
controller/backup_controller.go Show resolved Hide resolved
@ChanYiLin ChanYiLin force-pushed the LH8746_make_backup_deletion_async_and_add_deleting_state branch from cf4419c to 25dcfe8 Compare September 24, 2024 07:08
@ChanYiLin
Copy link
Contributor Author

Hi @derekbit
I have updated the PR, please take a look, thanks!
Let me know if you have any concern for the last comment.

Copy link
Member

@derekbit derekbit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, LGTM

controller/backup_controller.go Outdated Show resolved Hide resolved
controller/backup_controller.go Outdated Show resolved Hide resolved
controller/backup_controller.go Outdated Show resolved Hide resolved
@ChanYiLin ChanYiLin force-pushed the LH8746_make_backup_deletion_async_and_add_deleting_state branch from 25dcfe8 to dab29dc Compare September 30, 2024 06:11
@derekbit derekbit force-pushed the LH8746_make_backup_deletion_async_and_add_deleting_state branch from dab29dc to 1e10e27 Compare October 6, 2024 13:58
@derekbit
Copy link
Member

derekbit commented Oct 6, 2024

@c3y1huang Could you take a look at this PR? Thanks.

derekbit
derekbit previously approved these changes Oct 6, 2024
Copy link

mergify bot commented Oct 9, 2024

This pull request is now in conflict. Could you fix it @ChanYiLin? 🙏

@derekbit
Copy link
Member

@ChanYiLin Could you resolve the conflicts? Thanks.

Copy link
Contributor

@shuo-wu shuo-wu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general LGTM

controller/backup_volume_controller.go Show resolved Hide resolved
controller/backup_controller.go Outdated Show resolved Hide resolved
@derekbit
Copy link
Member

@ChanYiLin
Could you address the comments and resolve the conflicts? Thank you.

@ChanYiLin ChanYiLin force-pushed the LH8746_make_backup_deletion_async_and_add_deleting_state branch 2 times, most recently from a63073e to 47f49ce Compare October 16, 2024 09:29
@derekbit
Copy link
Member

cc @ChanYiLin

Running: golangci-lint
controller/backup_controller.go:1048:2: S1023: redundant `return` statement (gosimple)
	return

@ChanYiLin ChanYiLin force-pushed the LH8746_make_backup_deletion_async_and_add_deleting_state branch from 47f49ce to e56e419 Compare October 16, 2024 13:15
Copy link

coderabbitai bot commented Oct 16, 2024

Walkthrough

The changes in this pull request enhance the functionality of the backup management system within the application. Key updates include modifications to error handling in the BackupVolumeController, improvements in logging within the BackupTargetClient, and the introduction of new constants and a struct for managing backup deletion states in the BackupController. Additionally, new constants related to backup states have been added, and a utility function for error checking has been introduced. These updates collectively improve the traceability and management of backup operations.

Changes

File Change Summary
controller/backup_controller.go Added constants for logging errors, introduced DeletingStatus struct, updated syncHandler, and added methods for managing backup deletions.
controller/backup_volume_controller.go Modified error handling in reconcile method to prevent unnecessary logging for "not found" errors.
engineapi/backups.go Updated error handling in BackupGet method and added logging in BackupDelete method.
k8s/pkg/apis/longhorn/v1beta2/backup.go Added BackupStateDeleting constant and updated comments for clarity.
types/types.go Introduced ErrorIsInProgress function for checking specific error states.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant BackupController
    participant BackupTargetClient

    User->>BackupController: Request Backup Deletion
    BackupController->>BackupTargetClient: Start Deletion Process
    BackupTargetClient->>BackupController: Log Deletion Start
    BackupController->>BackupController: Update Backup Status to Deleting
    BackupController->>BackupTargetClient: Handle Deletion Asynchronously
    BackupTargetClient-->>BackupController: Return Deletion Result
    BackupController->>User: Notify Deletion Status
Loading

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Outside diff range and nitpick comments (6)
k8s/pkg/apis/longhorn/v1beta2/backup.go (1)

16-18: LGTM: New backup state for deletion process

The addition of BackupStateDeleting aligns with the PR objectives for implementing asynchronous backup deletion. The naming and placement are consistent with other backup states.

Consider a minor rewording of the comment for clarity:

- // Deleting is also considered as final state
- // as it only happens when the backup is being deleting and has deletion timestamp
+ // Deleting is considered a final state
+ // as it only occurs when the backup is being deleted and has a deletion timestamp
controller/backup_volume_controller.go (1)

326-326: Improved error handling for BackupGet operation

This change refines the error logging for the BackupGet operation by excluding "not found" errors. This aligns with the modified behavior of backupTargetClient.BackupGet and reduces noise in the logs.

Consider adding a debug log for the "not found" case to maintain traceability without cluttering the main log:

 if backupInfo, err := backupTargetClient.BackupGet(backupURL, backupTargetClient.Credential); err != nil && !types.ErrorIsNotFound(err) {
     log.WithError(err).WithFields(logrus.Fields{
         "backup":       backupName,
         "backupvolume": backupVolumeName,
         "backuptarget": backupURL}).Warn("Failed to get backupInfo from remote backup target")
+} else if types.ErrorIsNotFound(err) {
+    log.WithFields(logrus.Fields{
+        "backup":       backupName,
+        "backupvolume": backupVolumeName,
+        "backuptarget": backupURL}).Debug("Backup not found in remote backup target")
 } else {
     if backupInfo != nil && backupInfo.Labels != nil {
         if accessMode, exist := backupInfo.Labels[types.GetLonghornLabelKey(types.LonghornLabelVolumeAccessMode)]; exist {
             backupLabelMap[types.GetLonghornLabelKey(types.LonghornLabelVolumeAccessMode)] = accessMode
         }
     }
 }

This addition will help maintain visibility into "not found" cases without cluttering the main log, which can be useful for debugging and auditing purposes.

engineapi/backups.go (1)

Action Required: Update Callers to Handle Modified BackupGet Error Behavior

The BackupGet method in engineapi/backups.go has changed its error handling. Previously, it returned nil when a backup was not found, but it now returns an error in such cases. This modification affects the following files:

  • csi/controller_server.go
  • controller/backup_volume_controller.go
  • engineapi/backups.go
  • client/generated_backup_volume.go
  • app/recurring_job.go
  • api/systembackup.go
  • api/backup.go

Please ensure that each of these callers properly handles the new error scenario, distinguishing between critical errors and cases where a backup is not found to maintain application stability.

🔗 Analysis chain

Line range hint 302-306: Change in error handling behavior for BackupGet method

The error handling logic in the BackupGet method has been modified. Previously, if the backup configuration was not found, the method would return nil, nil. Now, it will return an error wrapped with a message indicating the failure to get the backup configuration.

This change might impact any calling code that relies on the previous behavior. Please ensure that all callers of this method are updated to handle the new error case appropriately.

To verify the impact of this change, we can search for all calls to BackupGet in the codebase:

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for all calls to BackupGet
rg --type go 'BackupGet\(' -A 5 -B 5

Length of output: 7229

types/types.go (1)

760-762: LGTM. Consider case-insensitive error message check.

The new ErrorIsInProgress function is well-implemented and consistent with other error checking functions in the file. It correctly checks if an error message contains the phrase "in progress".

For improved robustness, consider using case-insensitive string comparison:

func ErrorIsInProgress(err error) bool {
-	return strings.Contains(err.Error(), "in progress")
+	return strings.Contains(strings.ToLower(err.Error()), "in progress")
}

This change would catch variations like "In Progress" or "IN PROGRESS" as well.

controller/backup_controller.go (2)

83-86: Avoid redundant initialization of synchronization primitives

Given that the zero value of sync.Mutex and flowcontrol.Backoff are usable, the explicit initialization in the constructor may not be necessary. This can simplify the code.

Consider removing the explicit initializations:

-		deletingMapLock:       &sync.Mutex{},
-		inProgressDeletingMap: map[string]*DeletingStatus{},
-		deletingBackoff: flowcontrol.NewBackOff(DeletionMinInterval, DeletionMaxInterval),
+		// No need to initialize deletingMapLock and inProgressDeletingMap explicitly
+		// deletingBackoff will be initialized when first used

Also applies to: 121-124


1001-1032: Ensure consistent error handling and logging

In the handleBackupDeletionInBackupStore function, when errors occur during backup deletion, ensure that all error messages provide sufficient context and are logged appropriately. This will aid in debugging and monitoring.

Review the error messages and consider enhancing them for clarity.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 56903ba and e56e419.

📒 Files selected for processing (5)
  • controller/backup_controller.go (9 hunks)
  • controller/backup_volume_controller.go (1 hunks)
  • engineapi/backups.go (1 hunks)
  • k8s/pkg/apis/longhorn/v1beta2/backup.go (1 hunks)
  • types/types.go (1 hunks)
🧰 Additional context used
🔇 Additional comments (4)
k8s/pkg/apis/longhorn/v1beta2/backup.go (3)

8-11: LGTM: Improved comment capitalization

The capitalization of "Non-final state" improves consistency in the documentation style. This change doesn't affect functionality and enhances readability.


12-15: LGTM: Consistent comment capitalization

The capitalization of "Final state" maintains consistency with the previous comment modification. This change enhances readability without affecting functionality.


Line range hint 1-156: Summary: Changes align with PR objectives and improve code quality

The modifications in this file successfully introduce the new BackupStateDeleting constant, which is crucial for implementing asynchronous backup deletion as outlined in the PR objectives. The comment improvements enhance readability and maintain consistency in the codebase.

To ensure the new state is properly integrated, please run the following verification script:

This script will help identify any places where the new state should be handled but might have been overlooked.

✅ Verification successful

BackupStateDeleting integration verified successfully

The BackupStateDeleting constant is consistently used across the relevant controller files without any overlaps or conflicts with SystemBackupStateDeleting. This ensures clear and distinct handling of backup deletion states as intended.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the usage of the new BackupStateDeleting constant

# Test: Search for usages of BackupStateDeleting
rg --type go "BackupStateDeleting"

# Test: Check if BackupStateDeleting is included in switch statements or if statements handling backup states
rg --type go -e "switch.*BackupState" -e "if.*BackupState"

Length of output: 3265


Script:

#!/bin/bash
# Description: Identify all backup state constants and check for duplicates or inconsistencies

# Search for all constants related to BackupState
rg --type go "BackupState\w*" | sort | uniq

# Search for all constants related to SystemBackupState
rg --type go "SystemBackupState\w*" | sort | uniq

# Compare the two sets to identify overlaps or discrepancies
comm -12 <(rg --type go "BackupState\w*" | sort | uniq) <(rg --type go "SystemBackupState\w*" | sort | uniq)

Length of output: 25952


Script:

#!/bin/bash
# Description: Identify overlapping constants between BackupState and SystemBackupState without using 'comm'

# Extract all BackupState constants
rg --type go "BackupState\w*" | sort | uniq > backup_states.txt

# Extract all SystemBackupState constants
rg --type go "SystemBackupState\w*" | sort | uniq > system_backup_states.txt

# Find common constants using grep
grep -Fxf system_backup_states.txt backup_states.txt

Length of output: 7880

engineapi/backups.go (1)

Line range hint 302-310: Improved logging in BackupDelete method

The addition of a log entry at the start of the backup deletion process enhances traceability. This change improves the ability to track and debug backup deletions by providing clear start and end points for the operation.

The actual deletion logic remains unchanged, ensuring that the core functionality is preserved while improving observability.

controller/backup_controller.go Show resolved Hide resolved
controller/backup_controller.go Outdated Show resolved Hide resolved
controller/backup_controller.go Outdated Show resolved Hide resolved
@ChanYiLin ChanYiLin force-pushed the LH8746_make_backup_deletion_async_and_add_deleting_state branch from e56e419 to 40de0f8 Compare October 21, 2024 06:31
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (3)
types/types.go (1)

760-762: LGTM! Consider using a constant for the error message.

The new ErrorIsInProgress function is a good addition, following the existing pattern for error checking functions. It will be useful for identifying "in progress" errors.

Consider defining a constant for the "in progress" string to improve maintainability:

+const errorInProgressMessage = "in progress"

 func ErrorIsInProgress(err error) bool {
-	return strings.Contains(err.Error(), "in progress")
+	return strings.Contains(err.Error(), errorInProgressMessage)
 }
controller/backup_controller.go (2)

52-52: Simplify time duration expression

The expression time.Minute * 1 can be simplified to time.Minute for clarity.

Apply this diff to simplify:

-    	DeletionMinInterval = time.Minute * 1
+    	DeletionMinInterval = time.Minute

121-121: Unnecessary initialization of sync.Mutex

It's unnecessary to explicitly initialize sync.Mutex since its zero value is ready for use. Initializing it is redundant.

Apply this diff to remove the redundant initialization:

-    		deletingMapLock:       sync.Mutex{},
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between e56e419 and 40de0f8.

📒 Files selected for processing (5)
  • controller/backup_controller.go (9 hunks)
  • controller/backup_volume_controller.go (1 hunks)
  • engineapi/backups.go (1 hunks)
  • k8s/pkg/apis/longhorn/v1beta2/backup.go (1 hunks)
  • types/types.go (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • controller/backup_volume_controller.go
  • engineapi/backups.go
  • k8s/pkg/apis/longhorn/v1beta2/backup.go
🧰 Additional context used

controller/backup_controller.go Outdated Show resolved Hide resolved
derekbit
derekbit previously approved these changes Oct 21, 2024
Copy link
Member

@derekbit derekbit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, LGTM.

@derekbit derekbit force-pushed the LH8746_make_backup_deletion_async_and_add_deleting_state branch from 40de0f8 to a72d8e3 Compare October 21, 2024 06:52
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (2)
engineapi/backups.go (1)

Line range hint 302-310: Enhanced logging in BackupDelete method

The addition of log statements at the start and end of the backup deletion process significantly improves traceability. This change aligns well with the new asynchronous deletion feature, providing clear indicators of the deletion process status.

Consider adding a log statement for the case when types.ErrorIsNotFound(err) is true, to explicitly log when a backup is not found during deletion attempt.

 func (btc *BackupTargetClient) BackupDelete(backupURL string, credential map[string]string) error {
 	logrus.Infof("Start deleting backup %s", backupURL)
 	_, err := btc.ExecuteEngineBinaryWithoutTimeout("backup", "rm", backupURL)
 	if err != nil {
 		if types.ErrorIsNotFound(err) {
+			logrus.Infof("Backup %s not found during deletion, considering it as already deleted", backupURL)
 			return nil
 		}
 		return errors.Wrapf(err, "error deleting backup %v", backupURL)
 	}
 	logrus.Infof("Complete deleting backup %s", backupURL)
 	return nil
 }
types/types.go (1)

760-762: LGTM! Consider adding an error check for consistency.

The new ErrorIsInProgress function is well-implemented and consistent with other error checking functions in the file. It efficiently checks for the "in progress" substring in the error message.

For consistency with other error checking functions in this file, consider adding a nil check:

 func ErrorIsInProgress(err error) bool {
+	if err == nil {
+		return false
+	}
 	return strings.Contains(err.Error(), "in progress")
 }
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 40de0f8 and a72d8e3.

📒 Files selected for processing (5)
  • controller/backup_controller.go (9 hunks)
  • controller/backup_volume_controller.go (1 hunks)
  • engineapi/backups.go (1 hunks)
  • k8s/pkg/apis/longhorn/v1beta2/backup.go (1 hunks)
  • types/types.go (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • controller/backup_volume_controller.go
  • k8s/pkg/apis/longhorn/v1beta2/backup.go
🧰 Additional context used
🔇 Additional comments (11)
engineapi/backups.go (1)

302-305: Improved error handling in BackupGet method

The removal of the ErrorIsNotFound check enhances the robustness of the backup retrieval process. This change ensures that any failure to retrieve a backup configuration, including "not found" errors, is properly reported. This aligns well with the new asynchronous deletion feature, where a backup in the "Deleting" state should not be treated as non-existent.

controller/backup_controller.go (10)

38-39: LGTM: New message type constants added

The addition of MessageTypeReconcileInfo and MessageTypeReconcileError constants improves the categorization of messages in the backup status. The naming is clear and consistent with existing conventions.


43-48: LGTM: New message constants added for backup process

The addition of these message constants improves the clarity and consistency of log messages throughout the backup process. They cover various scenarios including waiting for snapshots, engines, and backup deletions, as well as error cases.


51-54: LGTM: Deletion interval constants added

The addition of DeletionMinInterval and DeletionMaxInterval constants provides clear boundaries for the deletion backoff mechanism. The chosen values (1 minute minimum, 24 hours maximum) seem reasonable for managing deletion retries.


56-60: LGTM: DeletingStatus struct added

The new DeletingStatus struct is a good addition for tracking the state and potential error messages during the backup deletion process. It provides a clear structure for managing deletion status information.


81-86: LGTM: BackupController struct updated for asynchronous deletion

The additions to the BackupController struct (deletingMapLock, inProgressDeletingMap, and deletingBackoff) provide the necessary components to support asynchronous deletion of backups. The use of a mutex for thread-safe access to the map and a backoff mechanism for managing retries are good practices.


121-124: LGTM: Proper initialization of new BackupController fields

The new fields in the BackupController struct are correctly initialized in the constructor. The mutex is initialized to its zero value, the map is created empty, and the backoff mechanism is set up with the defined min and max intervals.


784-795: LGTM: Prevention of backup creation during deletion

The addition of a check to prevent new backup creation while any backup is in the Deleting state is a good safeguard. This helps maintain consistency in the backup store and prevents potential conflicts during the deletion process.


945-947: LGTM: BackupStateDeleting added to final states

Adding BackupStateDeleting to the list of final states in the backupInFinalState method is consistent with the new asynchronous deletion feature. This ensures that backups in the deleting state are treated appropriately in the controller's logic.


949-972: LGTM: Asynchronous backup deletion implementation

The startDeletingBackupInBackupStore method is a well-implemented solution for asynchronous backup deletion. It correctly uses mutex locking, includes panic recovery in the goroutine, and properly handles errors by updating the deletion status map. This approach allows for efficient processing of deletion requests without blocking the controller.


1044-1050: LGTM: Thread-safe update of deletion status

The setInprogressDeletionMap method provides a thread-safe way to update the status of an in-progress deletion. It correctly uses a mutex to protect access to the shared map and updates both the state and error message. This method adheres to the principle of single responsibility and is well-implemented.

controller/backup_controller.go Outdated Show resolved Hide resolved
@ChanYiLin ChanYiLin force-pushed the LH8746_make_backup_deletion_async_and_add_deleting_state branch from a72d8e3 to f47e12e Compare October 21, 2024 07:00
ref: longhorn/longhorn 8746

Signed-off-by: Jack Lin <jack.lin@suse.com>
@derekbit derekbit force-pushed the LH8746_make_backup_deletion_async_and_add_deleting_state branch from f47e12e to 37a4dff Compare October 21, 2024 07:05
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between a72d8e3 and 37a4dff.

📒 Files selected for processing (5)
  • controller/backup_controller.go (9 hunks)
  • controller/backup_volume_controller.go (1 hunks)
  • engineapi/backups.go (1 hunks)
  • k8s/pkg/apis/longhorn/v1beta2/backup.go (1 hunks)
  • types/types.go (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
  • controller/backup_volume_controller.go
  • engineapi/backups.go
  • k8s/pkg/apis/longhorn/v1beta2/backup.go
  • types/types.go
🧰 Additional context used
🔇 Additional comments (4)
controller/backup_controller.go (4)

38-39: LGTM: New constants and struct for backup deletion process

The new constants and DeletingStatus struct are well-defined and support the enhanced backup deletion process. They provide clear message templates and a structured way to track deletion status.

Also applies to: 43-48, 51-59


120-124: LGTM: Proper initialization of new fields

The new fields in the BackupController struct are correctly initialized in the NewBackupController function. This ensures that the controller starts with the proper state for managing backup deletions.


1047-1053: LGTM: Thread-safe update of deletion status

The setInprogressDeletionMap method correctly uses the mutex to ensure thread-safe updates to the inProgressDeletingMap. This is crucial for maintaining data integrity in a concurrent environment.


Line range hint 1-1053: Overall assessment: Significant improvements with minor suggestions

The changes to the BackupController significantly enhance its capabilities, particularly in managing the backup deletion process. The new methods and data structures are well-designed and improve the controller's ability to handle concurrent operations and implement backoff strategies.

Key improvements:

  1. Introduction of a structured deletion process with clear state management.
  2. Implementation of a backoff mechanism for retrying deletions.
  3. Asynchronous deletion to improve performance.

Suggestions for further improvement:

  1. Consider using sync.Mutex directly instead of a pointer.
  2. Optimize the blocking mechanism for new backups during deletions.
  3. Enhance error handling in the asynchronous deletion process.
  4. Implement cleanup of the inProgressDeletingMap after successful deletions.

These changes represent a substantial improvement to the backup management system, with the suggested modifications potentially further enhancing its robustness and efficiency.

controller/backup_controller.go Show resolved Hide resolved
controller/backup_controller.go Show resolved Hide resolved
controller/backup_controller.go Show resolved Hide resolved
controller/backup_controller.go Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants