Fixes issue with smart clone hanging while waiting for initial import #2721

davidvossel · 2023-05-19T20:45:11Z

In Hypershift we use PVC smart cloning for the root volume disks for OCP worker nodes. When a cluster first comes online, we create a DV which imports the rhcos disk image into a pvc, and we immediately in parallel also create VMs which want to clone that source PVC as soon as it is available.

An issue we're encountering is that the smart clone can take up to around 16 minutes to even get triggered in our scenario due to the re-queue hitting the max rate limit backoff in the clone controller.

For example, all the DVs which have succeeded in the output below are sources to be cloned by the DVs stuck in CloneScheduled. The source PVCs have been imported, and there's nothing blocking the controller from starting the clone, but the controller simply isn't reconciling the DVs anymore due to the rate limiting that occurred while waiting for the sources to finish the import.

oc get dv -n e2e-clusters-l9c8g-example-b74s5
NAME                                                      PHASE            PROGRESS   RESTARTS   AGE
example-b74s5-test-kv-cache-root-volume-wlcfd-rhcos       CloneScheduled                         9m27s
example-b74s5-test-machineconfig-27pf9-rhcos              CloneScheduled                         9m27s
example-b74s5-test-ntomachineconfig-replace-8glz9-rhcos   CloneScheduled                         9m28s
example-b74s5-test-ntomachineconfig-replace-q2w6d-rhcos   CloneScheduled                         9m28s
example-b74s5-test-replaceupgrade-qgdtb-rhcos             CloneScheduled                         9m28s
kv-boot-image-cache-4qkq7                                 Succeeded        100.0%                9m29s
kv-boot-image-cache-dwn7l                                 Succeeded        100.0%                9m29s
kv-boot-image-cache-th6lc                                 Succeeded        100.0%                9m29s
kv-boot-image-cache-vscxw                                 Succeeded        100.0%                9m29s

The DVs stuck in CloneScheduled show an event that looks like this

Warning  SmartCloneSourceInUse  6m29s (x20 over 10m)  datavolume-pvc-clone-controller  pod e2e-clusters-l9c8g-example-b74s5/importer-kv-boot-image-cache-vscxw using PersistentVolumeClaim kv-boot-image-cache-vscxw

In this case, the importer-kv-boot-image-cache-vscxw pod isn't actually using the pvc anymore, it's just that the reconciler is taking up to 16 minutes (1000s) to try the key again in order to discover the source is ready to go.

We can improve this by giving the controller a small requeue timeout when this is encountered.

Fixes smart clone hanging while waiting for initial source import to complete.

davidvossel · 2023-05-19T20:45:19Z

An alternative more complex solution would be to watch the source DV that the import is occurring for, and then requeue all pending clones that reference the DV/PVC. I can contribute the simple fix. I don't have the bandwidth to contribute something more complex at the moment.

mhenriks · 2023-05-19T21:24:19Z

pkg/controller/datavolume/pvc-clone-controller.go

@@ -62,6 +63,8 @@ const (
 	CsiClone
 )

+const sourceInUseRequeueSeconds = time.Duration(15 * time.Second)


I'd prefer a lower value here. like 2 or 3 seconds

that's fine, but remember it's going to spin the reconcile loop continuously while waiting for the source to import.

So if the source takes 2 minutes to import, and 20 PVCs are waiting on it. that would be something like 1200 reconciles at a 2 sec requeue.

How about 5 seconds?

I don't care too much because this code is going to be deleted very soon and replaced with calls to the clone populator which currently has a very short requeue. I think 2 secs.

Nevertheless, each reconcile should happen quickly in this case, all the objects are cached

updated to use 5 second requeue

Signed-off-by: David Vossel <davidvossel@gmail.com>

davidvossel · 2023-05-22T17:26:07Z

/test pull-cdi-linter

awels · 2023-05-23T13:10:13Z

/test pull-containerized-data-importer-e2e-upg

awels · 2023-05-23T13:12:55Z

/lgtm

mhenriks · 2023-05-23T13:43:01Z

/approve

kubevirt-bot · 2023-05-23T13:43:14Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mhenriks

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [mhenriks]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

davidvossel · 2023-05-23T16:59:54Z

/retest-required

Need to replace requeue bool with requeue duration Signed-off-by: Michael Henriksen <mhenriks@redhat.com>

* Enable empty schedule in DataImportCron (#2711) Allow disabling DataImportCron schedule and support external trigger Signed-off-by: Ido Aharon <iaharon@redhat.com> * expand upon #2721 (#2731) Need to replace requeue bool with requeue duration Signed-off-by: Michael Henriksen <mhenriks@redhat.com> * Add clone from snapshot functionalities to clone-populator (#2724) * Add clone from snapshot functionalities to the clone populator Signed-off-by: Alvaro Romero <alromero@redhat.com> * Update clone populator unit tests to cover clone from snapshot capabilities Signed-off-by: Alvaro Romero <alromero@redhat.com> * Fix storage class assignation in temp-source claim for host-assisted clone from snapshot This commit also includes other minor and styling-related fixes Signed-off-by: Alvaro Romero <alromero@redhat.com> --------- Signed-off-by: Alvaro Romero <alromero@redhat.com> * Prepare CDI testing for the upcoming non-CSI lane (#2730) * Update functional tests to skip incompatible default storage classes Signed-off-by: Alvaro Romero <alromero@redhat.com> * Enable the use of non-csi HPP in testing lanes This commit modifies several scripts to allow the usage of classic HPP as the default SC in tests. This allows us to test our non-populator flow with a non-csi provisioner. Signed-off-by: Alvaro Romero <alromero@redhat.com> --------- Signed-off-by: Alvaro Romero <alromero@redhat.com> * Allow snapshots as format for DataImportCron created sources (#2700) * StorageProfile API for declaring format of resulting cron disk images Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com> * Integrate recommended format in dataimportcron controller Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com> * Take snapclass existence into consideration when populating cloneStrategy and sourceFormat Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com> --------- Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com> * Remove leader election test (#2745) Now that we are using the standard k8s leases from the controller runtime library, there is no need to test our implementation as it is no longer in use. This will save some testing time and random failures. Signed-off-by: Alexander Wels <awels@redhat.com> * Integration of Data volume using CDI populators (#2722) * move cleanup out of dv deletion It seemed off to call cleanup in the prepare function just because we don't call cleanup unless the dv is deleting. Instead we check in the clenup function itself if it should be done: in this 2 specific cases in case of deletion and in case the dv succeeded. The cleanup will be used in future commit also for population cleanup which we also want to happen not only on deletion. Signed-off-by: Shelly Kagan <skagan@redhat.com> * Use populator if csi storage class exists Add new datavolume phase PendingPopulation to indicate wffc when using populators, this new phase will be used in kubevirt in order to know that there is no need for dummy pod to pass wffc phase and that the population will occur once creating the vm. Signed-off-by: Shelly Kagan <skagan@redhat.com> * Update population targetPVC with pvc prime annotations The annotations will be used to update dv that uses the populators. Signed-off-by: Shelly Kagan <skagan@redhat.com> * Adjust UT with new behavior Signed-off-by: Shelly Kagan <skagan@redhat.com> * updates after review Signed-off-by: Shelly Kagan <skagan@redhat.com> * Fix import populator report progress The import pod should be taken from pvcprime Signed-off-by: Shelly Kagan <skagan@redhat.com> * Prevent requeue upload dv when failing to find progress report pod Signed-off-by: Shelly Kagan <skagan@redhat.com> * Remove size inflation in populators The populators are handling existing PVCs. The PVC already has a defined requested size, inflating the PVC' with fsoverhead will only be on the PVC' spec and will not reflect on the target PVC, this seems undesired. Instead if the populators is using by PVC that the datavolume controller created the inflation will happen there if needed. Signed-off-by: Shelly Kagan <skagan@redhat.com> * Adjust functional tests to handle dvs using populators Signed-off-by: Shelly Kagan <skagan@redhat.com> * Fix clone test Signed-off-by: Shelly Kagan <skagan@redhat.com> * add shouldUpdateProgress variable to know if need to update progress Signed-off-by: Shelly Kagan <skagan@redhat.com> * Change update of annotation from denied list to allowed list Instead if checking if the annotation on pvcPrime is not desired go over desired list and if the annotation exists add it. Signed-off-by: Shelly Kagan <skagan@redhat.com> * fix removing annotations from pv when rebinding Signed-off-by: Shelly Kagan <skagan@redhat.com> * More fixes and UT Signed-off-by: Shelly Kagan <skagan@redhat.com> * a bit more updates and UTs Signed-off-by: Shelly Kagan <skagan@redhat.com> --------- Signed-off-by: Shelly Kagan <skagan@redhat.com> * Run bazelisk run //robots/cmd/uploader:uploader -- -workspace /home/prow/go/src/github.com/kubevirt/project-infra/../containerized-data-importer/WORKSPACE -dry-run=false (#2751) Signed-off-by: kubevirt-bot <kubevirtbot@redhat.com> * Allow dynamic linked build for non bazel build (#2753) The current script always passes the static ldflag to the compiler which will result in a static binary. We would like to be able to build dynamic libraries instead. cdi-containerimage-server has to be static because we are copying it into the context of a container disk container which is most likely based on a scratch container and has no libraries for us to use. Signed-off-by: Alexander Wels <awels@redhat.com> * Disable DV GC by default (#2754) * Disable DV GC by default DataVolume garbage collection is a nice feature, but unfortunately it violates fundamental principle of Kubernetes. CR should not be auto-deleted when it completes its role (Job with TTLSecondsAfter- Finished is an exception), and once CR was created we can assume it is there until explicitly deleted. In addition, CR should keep idempotency, so the same CR manifest can be applied multiple times, as long as it is a valid update (e.g. DataVolume validation webhook does not allow updating the spec). When GC is enabled, some systems (e.g GitOps / ArgoCD) may require a workaround (DV annotation deleteAfterCompletion = "false") to prevent GC and function correctly. On the next kubevirt-bot Bump kubevirtci PR (with bump-cdi), it will fail on all kubevirtci lanes with tests referring DVs, as the tests IsDataVolumeGC() looks at CDIConfig Spec.DataVolumeTTLSeconds and assumes default is enabled. This should be fixed there. Signed-off-by: Arnon Gilboa <agilboa@redhat.com> * Fix test waiting for PVC deletion with UID Signed-off-by: Arnon Gilboa <agilboa@redhat.com> * Fix clone test assuming DV was GCed Signed-off-by: Arnon Gilboa <agilboa@redhat.com> * Fix DIC controller DV/PVC deletion when snapshot is ready Signed-off-by: Arnon Gilboa <agilboa@redhat.com> --------- Signed-off-by: Arnon Gilboa <agilboa@redhat.com> --------- Signed-off-by: Ido Aharon <iaharon@redhat.com> Signed-off-by: Michael Henriksen <mhenriks@redhat.com> Signed-off-by: Alvaro Romero <alromero@redhat.com> Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com> Signed-off-by: Alexander Wels <awels@redhat.com> Signed-off-by: Shelly Kagan <skagan@redhat.com> Signed-off-by: kubevirt-bot <kubevirtbot@redhat.com> Signed-off-by: Arnon Gilboa <agilboa@redhat.com> Co-authored-by: Ido Aharon <iaharon@redhat.com> Co-authored-by: Michael Henriksen <mhenriks@redhat.com> Co-authored-by: alromeros <alromero@redhat.com> Co-authored-by: akalenyu <akalenyu@redhat.com> Co-authored-by: Shelly Kagan <skagan@redhat.com> Co-authored-by: kubevirt-bot <kubevirtbot@redhat.com> Co-authored-by: Arnon Gilboa <agilboa@redhat.com>

kubevirt-bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. labels May 19, 2023

kubevirt-bot requested review from awels and mhenriks May 19, 2023 20:45

kubevirt-bot added the size/XS label May 19, 2023

mhenriks reviewed May 19, 2023

View reviewed changes

Fixes issue with smart clone hanging while waiting for initial import

7153d5f

Signed-off-by: David Vossel <davidvossel@gmail.com>

davidvossel force-pushed the clone-fix branch from 07a732e to 7153d5f Compare May 22, 2023 14:17

kubevirt-bot assigned awels May 23, 2023

kubevirt-bot added the lgtm Indicates that a PR is ready to be merged. label May 23, 2023

kubevirt-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 23, 2023

kubevirt-bot merged commit b2a5aa4 into kubevirt:main May 23, 2023

mhenriks added a commit to mhenriks/containerized-data-importer that referenced this pull request May 28, 2023

expand upon kubevirt#2721

45c830a

Need to replace requeue bool with requeue duration Signed-off-by: Michael Henriksen <mhenriks@redhat.com>

mhenriks mentioned this pull request May 28, 2023

expand upon #2721 #2731

Merged

mhenriks added a commit to mhenriks/containerized-data-importer that referenced this pull request May 30, 2023

expand upon kubevirt#2721

e28dc5b

Need to replace requeue bool with requeue duration Signed-off-by: Michael Henriksen <mhenriks@redhat.com>

kubevirt-bot pushed a commit that referenced this pull request May 31, 2023

expand upon #2721 (#2731)

20d21d4

Need to replace requeue bool with requeue duration Signed-off-by: Michael Henriksen <mhenriks@redhat.com>

awels pushed a commit to awels/containerized-data-importer that referenced this pull request Jun 21, 2023

expand upon kubevirt#2721 (kubevirt#2731)

116dcb4

Need to replace requeue bool with requeue duration Signed-off-by: Michael Henriksen <mhenriks@redhat.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixes issue with smart clone hanging while waiting for initial import #2721

Fixes issue with smart clone hanging while waiting for initial import #2721

davidvossel commented May 19, 2023

davidvossel commented May 19, 2023

mhenriks May 19, 2023

davidvossel May 19, 2023

mhenriks May 19, 2023

davidvossel May 22, 2023

davidvossel commented May 22, 2023

awels commented May 23, 2023

awels commented May 23, 2023

mhenriks commented May 23, 2023

kubevirt-bot commented May 23, 2023

davidvossel commented May 23, 2023

Fixes issue with smart clone hanging while waiting for initial import #2721

Fixes issue with smart clone hanging while waiting for initial import #2721

Conversation

davidvossel commented May 19, 2023

davidvossel commented May 19, 2023

mhenriks May 19, 2023

Choose a reason for hiding this comment

davidvossel May 19, 2023

Choose a reason for hiding this comment

mhenriks May 19, 2023

Choose a reason for hiding this comment

davidvossel May 22, 2023

Choose a reason for hiding this comment

davidvossel commented May 22, 2023

awels commented May 23, 2023

awels commented May 23, 2023

mhenriks commented May 23, 2023

kubevirt-bot commented May 23, 2023

davidvossel commented May 23, 2023