-
Notifications
You must be signed in to change notification settings - Fork 554
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rbd: Dont depend on image state to issue resync #4076
Conversation
b765827
to
6f3af4d
Compare
Looks like yum update on base image is broken :( |
6f3af4d
to
3e4dfd0
Compare
@@ -637,14 +658,28 @@ func (rs *ReplicationServer) ResyncVolume(ctx context.Context, | |||
ready = checkRemoteSiteStatus(ctx, mirrorStatus) | |||
} | |||
|
|||
err = rbdVol.ResyncVol(localStatus, req.Force) | |||
err = rbdVol.GetImageInfo() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GetImageInfo()
only returns an error? Can you not return whatever information you need? Checking attributes like rbdVol.CreatedAt
is dangerous, specially because of the required calling of GetImageInfo()
(that probably was a reason the function was not public).
Maybe introduce rbdVol.GetCreationTime()
that internally calls getImageInfo()
if the creation time is not set yet?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nixpanic yes it internally updates the image details in local structure and returns an error if that fails. am not sure we should start introduces functions to get a field of the rbdVol.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe rename GetImageInfo()
to UpdateImageInfo()
, but accessing attributes of rbdVolume outside of the rbs package is not very clean. There is no guarantee that the attribute has been set (unless you remember to call getImageInfo()
, which is rather error prone.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GetImgeInfo()
really should be an internal function of rbdVolume
. A Get...() function should return something in addition to an error.
Using attributes from rbdVolume outside of its own package is not very clean. Use functions that return requires attributes instead (so that the can be populated/checked in rbdVolume functions).
3e4dfd0
to
812b97b
Compare
812b97b
to
72d5df4
Compare
Added one extra function to return CreationTimestamp, @nixpanic PTAL |
72d5df4
to
bad71a4
Compare
return false, errors.New("failed to get image creation time") | ||
} | ||
|
||
log.UsefulLog(ctx, "savedImageTime=%v, currentImageTime=%v", savedImageTime, currentImageTime) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: remove this, or phrase it a little nicer, or move it to debug level.
internal/rbd/replication.go
Outdated
@@ -126,3 +98,23 @@ func (rv *rbdVolume) DisableVolumeReplication( | |||
|
|||
return nil | |||
} | |||
|
|||
func (rv *rbdVolume) CheckImageIsPrimary() error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am confused by the naming of the function. I expected a true/false return in case the image is (not) primary...
Instead, if the image is primary, it returns a ErrInvalidArgument
?
What is the intention of this function, and can it be made clearer by renaming and/or returning a bool? A comment with a description of the function would also help.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
things got messed up when I tried to refractor, for now removed refractoring as it looks complicated to do as part of this PR.
f32f641
to
daa2550
Compare
daa2550
to
f27714e
Compare
if len(strings.Split(parts[0], ":")) != 2 || len(strings.Split(parts[1], ":")) != 2 { | ||
return nil, status.Errorf(codes.Internal, "failed to parse image creation time: %s", savedImageTime) | ||
} | ||
secondsStr := strings.Split(parts[0], ":")[1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my opinion it would be to good to have a different function to parse the creation time and get the required format, we can also further add unit test for it. What do you think @Madhu-1 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, definitely a good suggestion!
if len(strings.Split(parts[0], ":")) != 2 || len(strings.Split(parts[1], ":")) != 2 { | ||
return nil, status.Errorf(codes.Internal, "failed to parse image creation time: %s", savedImageTime) | ||
} | ||
secondsStr := strings.Split(parts[0], ":")[1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, definitely a good suggestion!
// store the image creation time for resync | ||
_, err = rbdVol.GetMetadata(imageCreationTimeKey) | ||
if err != nil && errors.Is(err, librbd.ErrNotFound) { | ||
err = rbdVol.SetMetadata(imageCreationTimeKey, creationTime.String()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider using your own timestampToString()
and timestampFromString()
functions. There is no guarantee that creationTime.String()
produces the same string format across package versions.
f27714e
to
3c79b78
Compare
internal/rbd/rbd_util.go
Outdated
@@ -1558,6 +1558,19 @@ func (rv *rbdVolume) setImageOptions(ctx context.Context, options *librbd.ImageO | |||
return nil | |||
} | |||
|
|||
// getImageCreationTime returns the creation time of the image. if the image |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
getImageCreationTime -> GetImageCreationTime
if err != nil { | ||
return nil, status.Error(codes.Internal, err.Error()) | ||
return nil, status.Errorf(codes.Internal, "failed to get image creation time: %s", err.Error()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
consider including the image name and key in the error message, if something goes wrong it gives a hint to check while troubleshooting
if err != nil { | ||
return nil, getGRPCError(err) | ||
return nil, status.Errorf(codes.Internal, "failed to get image info: %s", err.Error()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
update the error message to include the name of the image?
3c79b78
to
9facccc
Compare
/test ci/centos/upgrade-tests-cephfs |
/test ci/centos/mini-e2e/k8s-1.27 |
/test ci/centos/mini-e2e/k8s-1.26 |
/test ci/centos/k8s-e2e-external-storage/1.28 |
/test ci/centos/upgrade-tests-rbd |
/test ci/centos/mini-e2e-helm/k8s-1.28 |
/test ci/centos/mini-e2e/k8s-1.28 |
✅ The pull request has been merged automaticallyThe pull request has been merged automatically at e013cfe |
Not sure why but go-lint is failing with below error and this fix is required to make it pass ``` directive `//nolint:staticcheck // See comment above.` is unused for linter "staticcheck" (nolintlint) ``` Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
During the Demote volume store the image creation timestamp. During Resync do below operation * Check image creation timestamp stored during Demote operation and current creation timestamp during Resync and check both are equal and its for force resync then issue resync * If the image on both sides is not in unknown state, check last_snapshot_timestamp on the local mirror description, if its present send volumeReady as false or else return error message. If both the images are in up+unknown the send volumeReady as true. Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
10b0f79
to
2b6cf40
Compare
/test ci/centos/upgrade-tests-cephfs |
/test ci/centos/k8s-e2e-external-storage/1.28 |
/test ci/centos/k8s-e2e-external-storage/1.26 |
/test ci/centos/upgrade-tests-rbd |
/test ci/centos/mini-e2e-helm/k8s-1.26 |
/test ci/centos/mini-e2e-helm/k8s-1.28 |
/test ci/centos/k8s-e2e-external-storage/1.27 |
/test ci/centos/mini-e2e/k8s-1.26 |
/test ci/centos/mini-e2e/k8s-1.28 |
/test ci/centos/mini-e2e-helm/k8s-1.27 |
/test ci/centos/mini-e2e/k8s-1.27 |
/test ci/centos/mini-e2e/k8s-1.27 |
/test ci/centos/mini-e2e/k8s-1.26 |
/test ci/centos/k8s-e2e-external-storage/1.27 |
Previously we were dependent on the image mirror state to issue the resync command, What we came to know is that we cannot depend on the image mirror state as the state can change anytime from one to another, We are following the below steps to fix this problem.
Storing the image creation timestamp on the rbd image as metadata and it gets cleaned up when the image is recreated as it gets synced up from the primary cluster.