-
Notifications
You must be signed in to change notification settings - Fork 553
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rbd: cleanup inconsistent state in reserveSnap()
after a failure
#4946
Conversation
`reserveSnap()` can potentially fail halfway through, in that case it needs to undo the snapshot reservation and restore modified attributes of the snapshot. Fixes: ceph#4945 Signed-off-by: Niels de Vos <ndevos@ibm.com>
// restore original values in case of an error | ||
origReservedID := rbdSnap.ReservedID | ||
origRbdSnapName := rbdSnap.RbdSnapName | ||
origVolID := rbdSnap.VolID | ||
defer func() { | ||
if err != nil { | ||
rbdSnap.ReservedID = origReservedID | ||
rbdSnap.RbdSnapName = origRbdSnapName | ||
rbdSnap.VolID = origVolID | ||
} | ||
}() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nixpanic any specific reason to preserve and restore these values? i assume if the reservation is done properly these values will be set and not overridden to any values isnt it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there are these two calls that overwrite the attributes in any case (success of error):
rbdSnap.ReservedID, rbdSnap.RbdSnapName, err = j.ReserveName(...
and
rbdSnap.VolID, err = util.GenerateVolID(...
So, if something goes wrong, it looks nicer to reset the attributes that may have been modified by a failing call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Setting those attributes only at the end of the function is tricky too, undoSnapReservation()
expects them to be set in order to clean things up in the journal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nixpanic the ReserveID and RbdSnapName names are not set earlier its only set by calling the j.ReserveName
function, IMO we dont need to preserve anything and also calling the defer incase of error should be enough. same goes for volID as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Madhu-1 if they are not expected to be set, shall I add a check and error message in case they are not an empty string?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
currently we dont have any case where they are set before calling this function. there are callers doing undo reservation and that will fail or not required
- Call Reserve to do reservation
- Undo if there is any error
we updated the current code to do undo internally if something fails.
Doesnt hurt to have this code but this will result in extra error/logs when we try to call the Undo function multiple times.
Let me know what do you prefer
- Merge the current changes
- Add a check to ensure the values are not set before calling
reserveSnap
internal function - Dont revert back and allow override.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current PR as it is shows what a function should do/restore. There are no callers that will call undoSnapReservation()
in case reserveSnap()
fails, so there should not be additional logs for that.
When a function returns an error, there should be no need for the caller to undo whatever internals the function may, or may not, have done. Everything internal to the function should stay there. For the more complex cases, a special undo
function can be returned, similar to what Manager.getGroupUUID()
does.
Merging the current PR as it is should be fine IMHO, @Madhu-1, but I am also fine with improving it some way you recommend.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it would not. In case there is an error returned by reserveSnap()
, the defer
where undoSnapReservation()
is called will never be reached.
return | ||
} | ||
|
||
undoErr := undoSnapReservation(ctx, rbdSnap, cr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be and is already called outside after reserveSnap() calls ?
https://github.com/ceph/ceph-csi/blob/devel/internal/rbd/controllerserver.go#L1194
https://github.com/ceph/ceph-csi/blob/devel/internal/rbd/controllerserver.go#L1194
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is not sufficient in case reserveSnap()
had an internal error. That is the reason why this change was suggested in the 1st place. A failing function should cleanup its own internal partially-completed state on a failure, reserveSnap()
did not do that yet.
@Madhu-1 @Rakshith-R , any other concerns? |
LGTM. |
@Mergifyio rebase |
☑️ Nothing to do
|
@Mergifyio queue |
✅ The pull request has been merged automaticallyThe pull request has been merged automatically at f3d40f9 |
/test ci/centos/k8s-e2e-external-storage/1.29 |
/test ci/centos/mini-e2e-helm/k8s-1.29 |
/test ci/centos/k8s-e2e-external-storage/1.31 |
/test ci/centos/mini-e2e/k8s-1.29 |
/test ci/centos/mini-e2e-helm/k8s-1.31 |
/test ci/centos/upgrade-tests-cephfs |
/test ci/centos/k8s-e2e-external-storage/1.30 |
/test ci/centos/mini-e2e/k8s-1.31 |
/test ci/centos/upgrade-tests-rbd |
/test ci/centos/mini-e2e-helm/k8s-1.30 |
/test ci/centos/mini-e2e/k8s-1.30 |
reserveSnap()
can potentially fail halfway through, in that case itneeds to undo the snapshot reservation and restore modified attributes
of the snapshot.
Fixes: #4945