-
Notifications
You must be signed in to change notification settings - Fork 332
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
never delete PVs while the PVC exists #507
Comments
A new PV from external-provisioner has:
Once it is bound, the status changes to So at first glance it looks like external-provisioner can and should check the phase. |
I'd like to know why PV controller marks the PV for deletion first. Because it can delete PVs provisioned by in-tree plugins too. Can you create a small reproducer, for example with hostpath or mock drivers? |
You can use this modified clusterload2 test: kubernetes/perf-tests#1530 It should be possible to run this with the hostpath driver (not tested, though). The invocation that I used to create 5000 volumes on a 50 node cluster was:
With
For hostpath, you need to change You can start with a much smaller number of volumes by lowering the What I observed was:
|
I ran the following Go program in parallel to the test:
It shows that a PV that gets deleted prematurely goes through these changes:
|
So yes, the status goes from "pending" to "released" without ever getting bound to the PVC. |
The PVC exists while this is happening. Could it be that kube-controller-manager relies on a PVC informer cache to determine whether the PVC exists? Then it might see a PV before the PVC cache gets updated and incorrectly conclude that the PV was released. |
I seem to recall this issue and remember that we fixed this by doing a live lookup https://github.com/kubernetes/kubernetes/blob/master/pkg/controller/volume/persistentvolume/pv_controller.go#L585. |
That fix went into Kubernetes 1.12. I'm running 1.18, so this must be something else. |
Trying this with the CSI hostpath driver crashes the driver:
|
I thought we had serialized gRPC call processing, but apparently not. |
No, we only discussed it: kubernetes-csi/csi-driver-host-path#72 (comment) |
Can you investigate why that fix isn't working for csi?
…On Wed, Oct 14, 2020, 04:12 Patrick Ohly ***@***.***> wrote:
No, we just discussed it: kubernetes-csi/csi-driver-host-path#72 (comment)
<kubernetes-csi/csi-driver-host-path#72 (comment)>
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#507 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AF2QYPK6PHRBRZ22MVBC3JLSKWBTNANCNFSM4SNES55Q>
.
|
Yes, I'm looking into it. So far I have not been able to reproduce it with a cluster brought up with |
I now have managed to reproduce it there: the key point was that external-provisioner had to run with qps + burst values that must allow it to create PVs very quickly. One example PV:
So it is the code that should have done the live lookup... but didn't do it because of https://github.com/kubernetes/kubernetes/blob/7ff41c1ba0f38920bb0bc09e70dc70e2205be155/pkg/controller/volume/persistentvolume/pv_controller.go#L584 "pv.kubernetes.io/bound-by-controller" is not set by external-provisioner. Should it do that? There is only "pv.kubernetes.io/provisioned-by: hostpath.csi.k8s.io" in the newly created PVs. |
According to the description of that annotation, no. That makes the code in Kubernetes incorrect. |
Removing |
I think this could be actually the right fix. At this point of PV controller we know that the PV is bound by UID, which is a clear signal the PVC has existed at some point it time (otherwise |
Wait. How is it possible that a PVC is dynamically provisioned and PV controller does not know about it (ie. does not have the PVC in its cache)? Who puts |
The modified clusterload2 test does that, if configured to do so - see kubernetes/perf-tests#1530 I added that after I noticed that I couldn't trigger volume creation as quickly as I wanted when relying on the PV controller. With this shortcut, the test then becomes more of a microbenchmark for external-provisioner + CSI driver, but that's exactly what I wanted. Right now it's still waiting for the PV controller to bind the PVs, but I can replace that with "wait for n PVs to exist" and then the PV controller will be completely out of the critical path. We should debate whether this is actually something that creators of a PVC are allowed to do. It works, so even if it is unintentional, it's probably too late to prevent it via PVC validation. IMHO it is useful and should be supported. If we agree, then I'll add it to the code that creates PVCs for generic ephemeral volumes, which will save one roundtrip through the API server and the PV controller when creating those volumes.
No, immediate binding. |
To me a microbenchmark should be consistent in what processes are involved.
I think it's odd to benchmark the provisioner sidecar but still involve the
latter half of pv controller. If we want to benchmark both pv controller
and sidecar, then we should include the whole sequence. If we want to
benchmark just the sidecar then we could do that with a unit or integration
test.
…On Thu, Oct 15, 2020, 07:28 Patrick Ohly ***@***.***> wrote:
Who puts volume.beta.kubernetes.io/storage-provisioner annotation there?
The modified clusterload2 test does that, if configured to do so - see
kubernetes/perf-tests#1530
<kubernetes/perf-tests#1530>
I added that after I noticed that I couldn't trigger volume creation as
quickly as I wanted when relying on the PV controller. With this shortcut,
the test then becomes more of a microbenchmark for external-provisioner +
CSI driver, but that's exactly what I wanted. Right now it's still waiting
for the PV controller to bind the PVs, but I can replace that with "wait
for n PVs to exist" and then the PV controller will be completely out of
the critical path.
We should debate whether this is actually something that creators of a PVC
are allowed to do. It works, so even if it is unintentional, it's probably
too late to prevent it via PVC validation.
IMHO it is useful and should be supported. If we agree, then I'll add it
to the code that creates PVCs for generic ephemeral volumes, which will
save one roundtrip through the API server and the PV controller when
creating those volumes.
do you use late binding in your test?
No, immediate binding.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#507 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AF2QYPPZB2ADAG7HO3LBIDLSK4BJNANCNFSM4SNES55Q>
.
|
I don't think adding the annotation to bypass the pv controller should be a
supported sequence. It bypasses the ability to bind to an existing pv in
the same storageclass
…On Thu, Oct 15, 2020, 08:52 Michelle Au ***@***.***> wrote:
To me a microbenchmark should be consistent in what processes are
involved. I think it's odd to benchmark the provisioner sidecar but still
involve the latter half of pv controller. If we want to benchmark both pv
controller and sidecar, then we should include the whole sequence. If we
want to benchmark just the sidecar then we could do that with a unit or
integration test.
On Thu, Oct 15, 2020, 07:28 Patrick Ohly ***@***.***> wrote:
> Who puts volume.beta.kubernetes.io/storage-provisioner annotation there?
>
> The modified clusterload2 test does that, if configured to do so - see
> kubernetes/perf-tests#1530
> <kubernetes/perf-tests#1530>
>
> I added that after I noticed that I couldn't trigger volume creation as
> quickly as I wanted when relying on the PV controller. With this shortcut,
> the test then becomes more of a microbenchmark for external-provisioner +
> CSI driver, but that's exactly what I wanted. Right now it's still waiting
> for the PV controller to bind the PVs, but I can replace that with "wait
> for n PVs to exist" and then the PV controller will be completely out of
> the critical path.
>
> We should debate whether this is actually something that creators of a
> PVC are allowed to do. It works, so even if it is unintentional, it's
> probably too late to prevent it via PVC validation.
>
> IMHO it is useful and should be supported. If we agree, then I'll add it
> to the code that creates PVCs for generic ephemeral volumes, which will
> save one roundtrip through the API server and the PV controller when
> creating those volumes.
>
> do you use late binding in your test?
>
> No, immediate binding.
>
> —
> You are receiving this because you commented.
> Reply to this email directly, view it on GitHub
> <#507 (comment)>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/AF2QYPPZB2ADAG7HO3LBIDLSK4BJNANCNFSM4SNES55Q>
> .
>
|
Agreed. I'll change that.
Also agreed.
Here I disagree. I want to benchmark the scenario where I have a large number of external-provisioner instances (one per node) and they all cooperate together on provisioning a high number of volumes. Doing it with a single instance is just the baseline that I need to compare against. With the clusterload2 test, I can scale out on a real cluster and it also gives me the apiserver metrics data, so I think it is a better tool for the job than a unit or integration test. |
We decided against backporting the Kubernetes change unless there's a good reason why someone needs it. When doing scale testing, QPS and burst settings can be set high enough for kube-controller-manager that bypassing it is not necessary. |
When doing performance testing, the following was observed:
The external-provisioner should never delete PVs that it just created.
The text was updated successfully, but these errors were encountered: