-
Notifications
You must be signed in to change notification settings - Fork 807
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add snapshots support #131
Conversation
Pull Request Test Coverage Report for Build 424
💛 - Coveralls |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took a look at the initial implementation and the direction we're heading to looks good to me.
I just skimmed through the tests, but I think it'd be a good idea to also add an integration test while the e2e tests infra is being worked out.
Description: aws.String(descriptions), | ||
} | ||
|
||
res, err := c.ec2.CreateSnapshotWithContext(ctx, request) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just something to keep in mind:
I wonder if creating a snapshot of a big volume can take a lot of time. If so, ctx
would time out and the operation cancelled.
The standard timeout that comes from the external-snapshotter
is 1 minute [1]. If a snapshot creation in AWS might take more than 1 minute, we may have to set a higher value in the external-snapshotter
manifest file.
[1] https://github.com/kubernetes-csi/external-snapshotter/blob/master/cmd/csi-snapshotter/main.go#L54
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, can you push the manifest for the external-snapshotter in deploy/kubernetes
as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right... I've never seen an AWS disk where the snapshot would take that long. The change in the external controller that simplified the phase handling is quite new (the actual snapshotting is usually quick but then some backends take a long time to move the snapshot however the original disk is OK to be used). I'll figure this out.
pkg/cloud/cloud.go
Outdated
err := c.waitForCreate(ctx, volumeID) | ||
if err != nil { | ||
return nil, fmt.Errorf("failed to restore snapshot %s: %v", diskOptions.SnapshotID, err) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hopefully this will no longer be necessary once #126 is merged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#126 is merged, could you rebase?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tsmetana Thanks for the PR. And sorry for getting back on this after a while. Could you rebase this on top of latest change?
pkg/cloud/cloud.go
Outdated
err := c.waitForCreate(ctx, volumeID) | ||
if err != nil { | ||
return nil, fmt.Errorf("failed to restore snapshot %s: %v", diskOptions.SnapshotID, err) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#126 is merged, could you rebase?
} | ||
request := &ec2.CreateSnapshotInput{ | ||
VolumeId: aws.String(volumeID), | ||
DryRun: aws.Bool(false), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: I think this is default to false, maybe we can leave it out
38721a1
to
37a3169
Compare
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: tsmetana If they are not already assigned, you can assign the PR to them by writing The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Rebased with some changes to the backoff logic. |
37a3169
to
ae92ad4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tsmetana sorry for the delay on this.
Could you add the external snapshotter in controller manifest file?
// Truncated exponential backoff: if the exponential backoff times-out, just keep polling using the longest interval | ||
err := wait.ExponentialBackoff(backoff, conditionFunc) | ||
if err == wait.ErrWaitTimeout { | ||
timeout := time.Duration(backoff.Duration.Seconds() * math.Pow(backoff.Factor, float64(backoff.Steps))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you explain more on this code path? Like when does this happen? Does this mean when we spent 1 min at ExponentialBackoff
then it times out, we will spend another min on PollInfinite
? Not sure what's the benefit of this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kuberentes API is supposed to be "intent based" so as long as there exists a VolumeSnapshot object the API should try to create it (no timeout). The idea here is not to extend the polling period too much so in case the operation succeeds/fail we are able to inform the user. Since it's unknown how long may the operation take I only used the exponential backoff for the first few poll iterations and then stopped to prolong the interval and make it constant.
pkg/driver/controller.go
Outdated
@@ -124,6 +126,20 @@ func (d *Driver) CreateVolume(ctx context.Context, req *csi.CreateVolumeRequest) | |||
Encrypted: isEncrypted, | |||
KmsKeyID: kmsKeyId, | |||
} | |||
|
|||
// Shall we restore a snapshot? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we still need this comment? If the snapshot ID is provide, the create volume API call will create the volume from snapshot, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Will remove it with reabse.
pkg/driver/controller.go
Outdated
return nil, status.Errorf(codes.Internal, "Could not create snapshot %q: %v", snapshotName, err) | ||
} | ||
csiSnapshot, err := newCreateSnapshotResponse(snapshot) | ||
return csiSnapshot, err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT:
return csiSnapshot, err | |
return newCreateSnapshotResponse(snapshot) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. Will change this.
/retest |
ae92ad4
to
8e5d99f
Compare
This is the first version of volume snapshotting support.
It implements:
I did only basic testing of the feature so far, but it looks to be working fine. I will add some examples and documentation of the feature.
Ref Issue #25