Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rbd: cleanup inconsistent state in reserveSnap() after a failure #4946

Merged
merged 1 commit into from
Nov 11, 2024

Conversation

nixpanic
Copy link
Member

@nixpanic nixpanic commented Nov 6, 2024

reserveSnap() can potentially fail halfway through, in that case it
needs to undo the snapshot reservation and restore modified attributes
of the snapshot.

Fixes: #4945

`reserveSnap()` can potentially fail halfway through, in that case it
needs to undo the snapshot reservation and restore modified attributes
of the snapshot.

Fixes: ceph#4945
Signed-off-by: Niels de Vos <ndevos@ibm.com>
@mergify mergify bot added the component/rbd Issues related to RBD label Nov 6, 2024
Comment on lines +391 to +401
// restore original values in case of an error
origReservedID := rbdSnap.ReservedID
origRbdSnapName := rbdSnap.RbdSnapName
origVolID := rbdSnap.VolID
defer func() {
if err != nil {
rbdSnap.ReservedID = origReservedID
rbdSnap.RbdSnapName = origRbdSnapName
rbdSnap.VolID = origVolID
}
}()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nixpanic any specific reason to preserve and restore these values? i assume if the reservation is done properly these values will be set and not overridden to any values isnt it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are these two calls that overwrite the attributes in any case (success of error):

rbdSnap.ReservedID, rbdSnap.RbdSnapName, err = j.ReserveName(...

and

rbdSnap.VolID, err = util.GenerateVolID(...

So, if something goes wrong, it looks nicer to reset the attributes that may have been modified by a failing call.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting those attributes only at the end of the function is tricky too, undoSnapReservation() expects them to be set in order to clean things up in the journal.

Copy link
Collaborator

@Madhu-1 Madhu-1 Nov 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nixpanic the ReserveID and RbdSnapName names are not set earlier its only set by calling the j.ReserveName function, IMO we dont need to preserve anything and also calling the defer incase of error should be enough. same goes for volID as well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Madhu-1 if they are not expected to be set, shall I add a check and error message in case they are not an empty string?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

currently we dont have any case where they are set before calling this function. there are callers doing undo reservation and that will fail or not required

  • Call Reserve to do reservation
  • Undo if there is any error

we updated the current code to do undo internally if something fails.

Doesnt hurt to have this code but this will result in extra error/logs when we try to call the Undo function multiple times.

Let me know what do you prefer

  1. Merge the current changes
  2. Add a check to ensure the values are not set before calling reserveSnap internal function
  3. Dont revert back and allow override.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current PR as it is shows what a function should do/restore. There are no callers that will call undoSnapReservation() in case reserveSnap() fails, so there should not be additional logs for that.

When a function returns an error, there should be no need for the caller to undo whatever internals the function may, or may not, have done. Everything internal to the function should stay there. For the more complex cases, a special undo function can be returned, similar to what Manager.getGroupUUID() does.

Merging the current PR as it is should be fine IMHO, @Madhu-1, but I am also fine with improving it some way you recommend.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it would not. In case there is an error returned by reserveSnap(), the defer where undoSnapReservation() is called will never be reached.

@nixpanic nixpanic requested a review from Madhu-1 November 7, 2024 07:51
return
}

undoErr := undoSnapReservation(ctx, rbdSnap, cr)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is not sufficient in case reserveSnap() had an internal error. That is the reason why this change was suggested in the 1st place. A failing function should cleanup its own internal partially-completed state on a failure, reserveSnap() did not do that yet.

@nixpanic nixpanic requested review from Rakshith-R and a team November 8, 2024 13:45
@nixpanic
Copy link
Member Author

@Madhu-1 @Rakshith-R , any other concerns?

@Rakshith-R
Copy link
Contributor

@Madhu-1 @Rakshith-R , any other concerns?

LGTM.

@nixpanic
Copy link
Member Author

@Mergifyio rebase

Copy link
Contributor

mergify bot commented Nov 11, 2024

rebase

☑️ Nothing to do

  • any of:
    • #commits > 1 [📌 rebase requirement]
    • #commits-behind > 0 [📌 rebase requirement]
    • -linear-history [📌 rebase requirement]
  • -closed [📌 rebase requirement]
  • -conflict [📌 rebase requirement]
  • queue-position = -1 [📌 rebase requirement]

@nixpanic
Copy link
Member Author

@Mergifyio queue

Copy link
Contributor

mergify bot commented Nov 11, 2024

queue

✅ The pull request has been merged automatically

The pull request has been merged automatically at f3d40f9

@mergify mergify bot added the ok-to-test Label to trigger E2E tests label Nov 11, 2024
@ceph-csi-bot
Copy link
Collaborator

/test ci/centos/k8s-e2e-external-storage/1.29

@ceph-csi-bot
Copy link
Collaborator

/test ci/centos/mini-e2e-helm/k8s-1.29

@ceph-csi-bot
Copy link
Collaborator

/test ci/centos/k8s-e2e-external-storage/1.31

@ceph-csi-bot
Copy link
Collaborator

/test ci/centos/mini-e2e/k8s-1.29

@ceph-csi-bot
Copy link
Collaborator

/test ci/centos/mini-e2e-helm/k8s-1.31

@ceph-csi-bot
Copy link
Collaborator

/test ci/centos/upgrade-tests-cephfs

@ceph-csi-bot
Copy link
Collaborator

/test ci/centos/k8s-e2e-external-storage/1.30

@ceph-csi-bot
Copy link
Collaborator

/test ci/centos/mini-e2e/k8s-1.31

@ceph-csi-bot
Copy link
Collaborator

/test ci/centos/upgrade-tests-rbd

@ceph-csi-bot
Copy link
Collaborator

/test ci/centos/mini-e2e-helm/k8s-1.30

@ceph-csi-bot
Copy link
Collaborator

/test ci/centos/mini-e2e/k8s-1.30

@ceph-csi-bot ceph-csi-bot removed the ok-to-test Label to trigger E2E tests label Nov 11, 2024
@mergify mergify bot merged commit f3d40f9 into ceph:devel Nov 11, 2024
41 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/rbd Issues related to RBD
Projects
None yet
Development

Successfully merging this pull request may close these issues.

rbd: reserveSnap() should cleanup its state in case of a failure
4 participants