Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

doc: UML diagram for volume creation and deletion #532

Merged
merged 3 commits into from
Mar 17, 2021

Conversation

pohly
Copy link
Contributor

@pohly pohly commented Dec 3, 2020

What type of PR is this?
/kind documentation

What this PR does / why we need it:

While this information is available in various design documents, users
of the external-provisioner benefit from getting a summary directly in
the documentation of external-provisioner. This is a good opportunity
to call our expected behavior of the CSI driver.

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

NONE

While this information is available in various design documents, users
of the external-provisioner benefit from getting a summary directly in
the documentation of external-provisioner. This is a good opportunity
to call our expected behavior of the CSI driver.
@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. kind/documentation Categorizes issue or PR as related to documentation. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Dec 3, 2020
@k8s-ci-robot k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Dec 3, 2020
@pohly
Copy link
Contributor Author

pohly commented Dec 3, 2020

A first revision of this was in PR #524. We can and should merge it separately because it is documenting the current status, not that enhancement.

/assign @jsafrane

@pohly pohly mentioned this pull request Dec 3, 2020
4 tasks
@verult
Copy link
Contributor

verult commented Dec 11, 2020

/cc

@k8s-ci-robot k8s-ci-robot requested a review from verult December 11, 2020 03:09
@@ -103,6 +103,34 @@ See the [storage capacity section](#capacity-support) below for details.

* All glog / klog arguments are supported, such as `-v <log level>` or `-alsologtostderr`.

### Design
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really useful for developers of external-provisioner/pv controller, but I like to keep the README focused on users. Can we move this to a docs/ subpage and reference it from the readme?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay.

@pohly
Copy link
Contributor Author

pohly commented Dec 14, 2020

@msau42 given what we discussed in PR #524 about the pv protection finalizer, would it be more correct to show external-provisioner and the PV controller to act on a PV at the same time in parallel? One removes the finalizer, the other handles the DeleteVolume call and deletes the PV?

@msau42
Copy link
Collaborator

msau42 commented Dec 15, 2020

would it be more correct to show external-provisioner and the PV controller to act on a PV at the same time in parallel? One removes the finalizer, the other handles the DeleteVolume call and deletes the PV?

Sure, that sounds good. There isn't actually a finalizer to protect the PV object from being deleted while the disk still exists.

@pohly
Copy link
Contributor Author

pohly commented Dec 15, 2020

would it be more correct to show external-provisioner and the PV controller to act on a PV at the same time in parallel? One removes the finalizer, the other handles the DeleteVolume call and deletes the PV?

No, the current sequential flow is correct. First the PV controller sets phase: Released, then the external-provisioner deletes the PV, then the PV controller removes the finalizer and the PVC gets removed.

I checked by deleting the PVC while the CSI driver was down: the PVC continues to have the finalizer until the CSI driver comes back up and deletes the PVC.

What's the rationale for not removing the finalizer as soon as the PVC is gone and the phase is Released?

@msau42
Copy link
Collaborator

msau42 commented Dec 15, 2020

This is not my understanding of how the pv protection controller works. It only waits for PV to be in released state:
https://github.com/kubernetes/kubernetes/blob/6d43e2b3bb55237fbb64118dc13beac259745983/pkg/controller/volume/pvprotection/pv_protection_controller.go#L143

And PV controller sets PV to released state before doing the delete operation:
https://github.com/kubernetes/kubernetes/blob/6d43e2b3bb55237fbb64118dc13beac259745983/pkg/controller/volume/persistentvolume/pv_controller.go#L639

@pohly
Copy link
Contributor Author

pohly commented Dec 15, 2020

This is not my understanding of how the pv protection controller works. It only waits for PV to be in released state:
https://github.com/kubernetes/kubernetes/blob/6d43e2b3bb55237fbb64118dc13beac259745983/pkg/controller/volume/pvprotection/pv_protection_controller.go#L143

The code block which removes the finalizer is only called when IsDeletionCandidate returns true. That function only returns true if the PV has a deletion time stamp: https://github.com/kubernetes/kubernetes/blob/62876fc6d93e891aa7fbe19771e6a6c03773b0f7/pkg/controller/volume/protectionutil/utils.go#L24-L28

That's consistent with my observation that the finalizer is kept in place until someone deletes the PV or rather, because of the finalizer, sets the deletion time stamp.

That logic is three years old, from kubernetes/kubernetes#58743 (search for isDeletionCandidate). 🤷

@msau42
Copy link
Collaborator

msau42 commented Dec 15, 2020

Ah ok thanks for pointing that out, that explains it.

I think it is preferable to wait for the volume to actually be deleted, that prevents the volume from leaking unless the user went in and deleted the PV object themselves.

@pohly
Copy link
Contributor Author

pohly commented Dec 15, 2020

I think it is preferable to wait for the volume to actually be deleted, that prevents the volume from leaking

The same can be achieved when removing the finalizer after setting phase: Released and before external-provisioner deletes the PV. Having the finalizer still set after just delays the actual deletion because the PV controller then must do another update.

A finalizer that automatically gets removed when the object gets deleted still makes no sense to me.

@msau42
Copy link
Collaborator

msau42 commented Dec 15, 2020

The same can be achieved when removing the finalizer after setting phase: Released and before external-provisioner deletes the PV.

If the finalizer gets removed before external-provisioner deletes the PV, then a user could delete the PV object before external-provisioner gets to it, causing the volume to leak.

@pohly
Copy link
Contributor Author

pohly commented Dec 15, 2020

A user can also delete the PV when the finalizer is set and then the PV controller will remove the finalizer without the external-provisioner ever getting involved. The finalizer makes no difference, a user simply must not delete PVs.

Hmm that kind of makes the purpose of the finalizer a little moot then. But that's a separate discussion :-)

@msau42
Copy link
Collaborator

msau42 commented Dec 16, 2020

/approve

Just one last nit from me to move the design section out of the top level README

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: msau42, pohly

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 16, 2020
It is not entirely clear why the PV controller keeps the finalizer set
until after the PV is
deleted (kubernetes-csi#532 (comment)).

Instead of saying "determines that PV has no volume" it's better to
keep this a bit more open-ended.
The top-level README.md is more focused on end-users and those
probably don't need an in-depth understanding of each step.
@pohly
Copy link
Contributor Author

pohly commented Dec 16, 2020

@msau42 I've created a separate doc/design.md and also changed the wording of the last step in the diagram:

**PV controller** determines that PV has no volume -> **PV controller** checks the volume one last time
and removes the pv protection finalizer.

That's more accurate because the PV controller doesn't really check whether the volume still exists.

@msau42
Copy link
Collaborator

msau42 commented Dec 16, 2020

Sorry I accidentally modified your comment instead of replying to it. I tried to restore it back to the original.. Anyway, to be most accurate, should we say it's the storage protection controller that removes the finalizer?

@pohly
Copy link
Contributor Author

pohly commented Dec 16, 2020

Anyway, to be most accurate, should we say it's the storage protection controller that removes the finalizer?

I'm talking about pkg/controller/volume/pvprotection/pv_protection_controller.go - is that the same thing? That's where the kubernetes.io/pv-protection finalizer gets removed.

@pohly
Copy link
Contributor Author

pohly commented Dec 16, 2020

I agree, the last step is "PV protection controller checks the volume one last time and removes the pv protection finalizer".

I used the term "PV controller" already earlier, without double-checking in the source code which part that really is. Did I get those steps right? I'm not sure anymore.

Probably it would be good to define each term and link it to the source code.

@pohly
Copy link
Contributor Author

pohly commented Dec 16, 2020

This doesn't need to go into the next release. We can enhance it after New Years.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 16, 2021
@pohly
Copy link
Contributor Author

pohly commented Mar 17, 2021

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 17, 2021
@jsafrane
Copy link
Contributor

sorry, I missed this. IMO it seems to be accurate and most @msau42's comments were addressed.
/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 17, 2021
@k8s-ci-robot k8s-ci-robot merged commit 1becfb9 into kubernetes-csi:master Mar 17, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/documentation Categorizes issue or PR as related to documentation. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note-none Denotes a PR that doesn't merit a release note. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants