Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add snapshots support #131

Merged
merged 1 commit into from
Feb 8, 2019

Conversation

tsmetana
Copy link
Contributor

@tsmetana tsmetana commented Nov 30, 2018

This is the first version of volume snapshotting support.

It implements:

  • CREATE_DELETE_SNAPSHOT capability support (i.e. no snapshot listing so far).
  • create volume from snapshot

I did only basic testing of the feature so far, but it looks to be working fine. I will add some examples and documentation of the feature.

Ref Issue #25

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Nov 30, 2018
@k8s-ci-robot k8s-ci-robot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Nov 30, 2018
@coveralls
Copy link

coveralls commented Nov 30, 2018

Pull Request Test Coverage Report for Build 424

  • 136 of 235 (57.87%) changed or added relevant lines in 3 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage decreased (-0.6%) to 63.814%

Changes Missing Coverage Covered Lines Changed/Added Lines %
pkg/driver/controller.go 43 67 64.18%
pkg/cloud/cloud.go 93 130 71.54%
pkg/cloud/fakes.go 0 38 0.0%
Totals Coverage Status
Change from base Build 417: -0.6%
Covered Lines: 917
Relevant Lines: 1437

💛 - Coveralls

Copy link
Member

@bertinatto bertinatto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took a look at the initial implementation and the direction we're heading to looks good to me.

I just skimmed through the tests, but I think it'd be a good idea to also add an integration test while the e2e tests infra is being worked out.

Description: aws.String(descriptions),
}

res, err := c.ec2.CreateSnapshotWithContext(ctx, request)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just something to keep in mind:

I wonder if creating a snapshot of a big volume can take a lot of time. If so, ctx would time out and the operation cancelled.

The standard timeout that comes from the external-snapshotter is 1 minute [1]. If a snapshot creation in AWS might take more than 1 minute, we may have to set a higher value in the external-snapshotter manifest file.

[1] https://github.com/kubernetes-csi/external-snapshotter/blob/master/cmd/csi-snapshotter/main.go#L54

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, can you push the manifest for the external-snapshotter in deploy/kubernetes as well?

Copy link
Contributor Author

@tsmetana tsmetana Dec 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right... I've never seen an AWS disk where the snapshot would take that long. The change in the external controller that simplified the phase handling is quite new (the actual snapshotting is usually quick but then some backends take a long time to move the snapshot however the original disk is OK to be used). I'll figure this out.

err := c.waitForCreate(ctx, volumeID)
if err != nil {
return nil, fmt.Errorf("failed to restore snapshot %s: %v", diskOptions.SnapshotID, err)
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hopefully this will no longer be necessary once #126 is merged.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#126 is merged, could you rebase?

pkg/cloud/cloud.go Outdated Show resolved Hide resolved
pkg/cloud/cloud.go Outdated Show resolved Hide resolved
pkg/driver/controller.go Outdated Show resolved Hide resolved
pkg/driver/controller.go Show resolved Hide resolved
pkg/driver/controller.go Outdated Show resolved Hide resolved
@leakingtapan leakingtapan changed the base branch from next to master December 3, 2018 20:13
@kubernetes-sigs kubernetes-sigs deleted a comment from suphanatnack Dec 26, 2018
Copy link
Contributor

@leakingtapan leakingtapan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tsmetana Thanks for the PR. And sorry for getting back on this after a while. Could you rebase this on top of latest change?

pkg/cloud/cloud.go Outdated Show resolved Hide resolved
err := c.waitForCreate(ctx, volumeID)
if err != nil {
return nil, fmt.Errorf("failed to restore snapshot %s: %v", diskOptions.SnapshotID, err)
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#126 is merged, could you rebase?

}
request := &ec2.CreateSnapshotInput{
VolumeId: aws.String(volumeID),
DryRun: aws.Bool(false),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: I think this is default to false, maybe we can leave it out

pkg/cloud/cloud.go Outdated Show resolved Hide resolved
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: tsmetana
To fully approve this pull request, please assign additional approvers.
We suggest the following additional approver: jsafrane

If they are not already assigned, you can assign the PR to them by writing /assign @jsafrane in a comment when ready.

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@tsmetana
Copy link
Contributor Author

Rebased with some changes to the backoff logic.

Copy link
Contributor

@leakingtapan leakingtapan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tsmetana sorry for the delay on this.

Could you add the external snapshotter in controller manifest file?

// Truncated exponential backoff: if the exponential backoff times-out, just keep polling using the longest interval
err := wait.ExponentialBackoff(backoff, conditionFunc)
if err == wait.ErrWaitTimeout {
timeout := time.Duration(backoff.Duration.Seconds() * math.Pow(backoff.Factor, float64(backoff.Steps)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain more on this code path? Like when does this happen? Does this mean when we spent 1 min at ExponentialBackoff then it times out, we will spend another min on PollInfinite? Not sure what's the benefit of this

Copy link
Contributor Author

@tsmetana tsmetana Feb 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kuberentes API is supposed to be "intent based" so as long as there exists a VolumeSnapshot object the API should try to create it (no timeout). The idea here is not to extend the polling period too much so in case the operation succeeds/fail we are able to inform the user. Since it's unknown how long may the operation take I only used the exponential backoff for the first few poll iterations and then stopped to prolong the interval and make it constant.

@@ -124,6 +126,20 @@ func (d *Driver) CreateVolume(ctx context.Context, req *csi.CreateVolumeRequest)
Encrypted: isEncrypted,
KmsKeyID: kmsKeyId,
}

// Shall we restore a snapshot?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still need this comment? If the snapshot ID is provide, the create volume API call will create the volume from snapshot, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Will remove it with reabse.

return nil, status.Errorf(codes.Internal, "Could not create snapshot %q: %v", snapshotName, err)
}
csiSnapshot, err := newCreateSnapshotResponse(snapshot)
return csiSnapshot, err
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT:

Suggested change
return csiSnapshot, err
return newCreateSnapshotResponse(snapshot)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. Will change this.

@leakingtapan
Copy link
Contributor

/retest

@leakingtapan leakingtapan changed the base branch from master to snapshot February 8, 2019 17:11
@leakingtapan
Copy link
Contributor

Thx @tsmetana for rebasing the change. We are merging this to snapshot branch to bake it a bit more and fix some timeout issue. After that, we will merge the final change into master

@dkoshkin

@leakingtapan leakingtapan merged commit 39fc9bd into kubernetes-sigs:snapshot Feb 8, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants