-
Notifications
You must be signed in to change notification settings - Fork 374
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement distributed snapshotting #585
Implement distributed snapshotting #585
Conversation
Skipping CI for Draft Pull Request. |
/test all |
/assign @yuxiangqian |
Does csi hostpath driver support distributed snapshotting? |
I tested these changes with csi hostpath driver and it works as expected. |
deploy/kubernetes/snapshot-controller/rbac-snapshot-controller.yaml
Outdated
Show resolved
Hide resolved
README.md
Outdated
@@ -165,6 +165,8 @@ Read more about how to install the example webhook [here](deploy/kubernetes/webh | |||
|
|||
* `--worker-threads`: Number of worker threads for running create snapshot and delete snapshot operations. Default value is 10. | |||
|
|||
* `--node-deployment`: Enables deploying the sidecar controller together with a CSI driver on nodes to manage node-local volumes. Off by default. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add more instructions in the "Usage" section on how to use this properly.
@nearora-msft Can you please provide more details on how you deployed it and how you tested it. Some testing results will be helpful as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume this is just part of the CL?
if *enableNodeDeployment { | ||
node := os.Getenv("NODE_NAME") | ||
if node == "" { | ||
klog.Fatal("The NODE_NAME environment variable must be set when using --enable-node-deployment.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should it just exit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should there be a specific exit statement? klog.Fatal() logs the error and then calls os.Exit(255).
return "", nil | ||
} | ||
|
||
for _, node := range nodes.Items { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fact that the first matching node will ALWAYS be picked to be labeled is a bit concerning though TBH I do not have a solution. It might result in a situation all snapshot contents will be labeled with the same node name, and the sidecar running on that node became hot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can there be multiple nodes that can match with the node affinity for a local volume?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK there can be only a single node that matches the node affinity
Hi @nearora-msft Can you please address the comments? Thanks. |
@nearora-msft Can you rebase it? I will test it. |
@nearora-msft Are you still working on this? If not, I would be willing to complete the work. |
Yes, sorry about the delay. Yes, still working on it. Planning to addressing the comments today. |
Yes, will do |
0db781e
to
b94354f
Compare
/retest |
21ef58a
to
b5680b9
Compare
Addressed the comments and tested with the latest revision. |
0ce0da3
to
73543dc
Compare
/test pull-kubernetes-csi-external-snapshotter-1-23-on-kubernetes-master |
/test pull-kubernetes-csi-external-snapshotter-alpha-1-22-on-kubernetes-1-22 |
@@ -106,7 +106,8 @@ const ( | |||
VolumeSnapshotContentInvalidLabel = "snapshot.storage.kubernetes.io/invalid-snapshot-content-resource" | |||
// VolumeSnapshotInvalidLabel is applied to invalid snapshot as a label key. The value does not matter. | |||
// See https://github.com/kubernetes/enhancements/blob/master/keps/sig-storage/177-volume-snapshot/tighten-validation-webhook-crd.md#automatic-labelling-of-invalid-objects | |||
VolumeSnapshotInvalidLabel = "snapshot.storage.kubernetes.io/invalid-snapshot-resource" | |||
VolumeSnapshotInvalidLabel = "snapshot.storage.kubernetes.io/invalid-snapshot-resource" | |||
VolumeSnapshotContentManagedByLabel = "snapshot.storage.kubernetes.io/managed-by" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a comment to explain what this label is for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add something like this to clarify what is in this label: It specifies the node name which handles the snapshot for the volume local to that node.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
73543dc
to
fa1110c
Compare
metricsPath = flag.String("metrics-path", "/metrics", "The HTTP path where prometheus metrics will be exposed. Default is `/metrics`.") | ||
retryIntervalStart = flag.Duration("retry-interval-start", time.Second, "Initial retry interval of failed volume snapshot creation or deletion. It doubles with each failure, up to retry-interval-max. Default is 1 second.") | ||
retryIntervalMax = flag.Duration("retry-interval-max", 5*time.Minute, "Maximum retry interval of failed volume snapshot creation or deletion. Default is 5 minutes.") | ||
enableNodeDeployment = flag.Bool("node-deployment", false, "Enables deploying the sidecar controller together with a CSI driver on nodes to manage snapshots for node-local volumes.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make sense to run the snapshotter with --node-deployment
and without --enable-distributed-snapshotting
?
If no, then --node-deployment
can be removed and it can be enabled together with --enable-distributed-snapshotting
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to have a flag to determine whether the sidecar is enabled for distributed snapshotting and it needs to be disabled by default. With a new feature like this, we can't enable it by default.
@nearora-msft, can you add a section in README with the title "Distributed Snapshotting", and clarify both flags are needed for this to work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fa1110c
to
bd102f6
Compare
ddafcc5
to
21fc337
Compare
/test pull-kubernetes-csi-external-snapshotter-1-23-on-kubernetes-master |
/test pull-kubernetes-csi-external-snapshotter-alpha-1-22-on-kubernetes-1-22 |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: nearora-msft, xing-yang The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@nearora-msft I'm merging the PR. Thanks for your hard work! Can you add some description for this Please also backport this PR to release-5.0 branch. |
Sure |
…tting Backport #585 to release 5.0
Yes @nearora-msft thank you for the work. Have you thought about the reverse problem? Now that we can properly take snapshots, we need to restore them. In my testing with WFFC storage I found that we have a similar problem on restoring (same problem exists for csi clone). What happens if the scheduler schedules the pod that uses the new PVC that restores on a node that is not the same node as where the snapshot resides. There are 2 options on solving it, but only one makes good sense IMO:
2 makes the most sense to me, but I am writing just in case I completely missed something obvious that solves the restore problem. |
@awels Yes, 2 makes more sense to me as well. As per this implementation of distributed provisioning, for local volumes, if binding mode is set to WaitForFirstConsumer, the pvc will contain the name of the node in the selectedNode annotation and the pv will volume would be created on that node. I can see this annotation being set here. We could probably do something similar for snapshot restore. We could set the same annotation in case of snapshot restore, regardless of the binding mode. The annotation will contain info about the node that is handling the snapshot and the external-provisioner will make sure that the snapshot is restored on that node. We'd probably have to make changes in the external-provisioner as well. @pohly please correct me if I didn't interpret this correctly. |
Yes, but AFAICT there is no mechanism to ensure that the nodeName being set by the scheduler is the same node as the dataSource source (PVC or snapshot). Which is where my filter comment came from. IMO we need to solve the problem in the scheduler, which will then put the right node in the annotation, and from there everything should work. Now making a scheduler extension that does this doesn't seem terribly hard, the problem is in getting all the pods being created to use the extension when a csi driver that needs it is in the cluster, that is the part I am struggling with. |
Ah, ok. Got it. I can't think of a solution at the top of my head but will update the thread if I can think of something. |
What type of PR is this?
/kind feature
What this PR does / why we need it:
This PR will allow snapshot sidecar controller to be deployed on every node along with the CSI drivers that handle local volumes:
Which issue(s) this PR fixes:
Fixes #484
Special notes for your reviewer:
Pending changes: Unit tests and documentation.
Does this PR introduce a user-facing change?: