openshift · apurvabhide17 · Jun 19, 2025 · Jun 19, 2025 · Jun 20, 2025 · nunzy1
diff --git a/...pplication_backup_and_restore/troubleshooting/backup-and-restore-cr-issues.adoc b/...pplication_backup_and_restore/troubleshooting/backup-and-restore-cr-issues.adoc
@@ -9,89 +9,14 @@ include::_attributes/attributes-openshift-dedicated.adoc[]
 
 toc::[]
 
-You might encounter these common issues with `Backup` and `Restore` custom resources (CRs).
+You might encounter the following common issues with `Backup` and `Restore` custom resources (CRs):
 
-[id="backup-cannot-retrieve-volume_{context}"]
-== Backup CR cannot retrieve volume
+* Backup CR cannot retrieve volume
+* Backup CR status remains in progress 
+* Backup CR status remains in PartiallyFailed
 
-The `Backup` CR displays the following error message: `InvalidVolume.NotFound: The volume ‘vol-xxxx’ does not exist`.
+include::modules/troubleshooting-backup-cr-cannot-retrieve-volume-issue.adoc[leveloffset=+1]
 
-.Cause
+include::modules/troubleshooting-backup-cr-status-remains-in-progress-issue.adoc[leveloffset=+1]
 
-The persistent volume (PV) and the snapshot locations are in different regions.
-
-.Solution
-
-. Edit the value of the `spec.snapshotLocations.velero.config.region` key in the `DataProtectionApplication` manifest so that the snapshot location is in the same region as the PV.
-. Create a new `Backup` CR.
-
-[id="backup-cr-remains-in-progress_{context}"]
-== Backup CR status remains in progress
-
-The status of a `Backup` CR remains in the `InProgress` phase and does not complete.
-
-.Cause
-
-If a backup is interrupted, it cannot be resumed.
-
-.Solution
-
-. Retrieve the details of the `Backup` CR by running the following command:
-+
-[source,terminal]
-----
-$ oc -n {namespace} exec deployment/velero -c velero -- ./velero \
-  backup describe <backup>
-----
-
-. Delete the `Backup` CR by running the following command:
-+
-[source,terminal]
-----
-$ oc delete backups.velero.io <backup> -n openshift-adp
-----
-+
-You do not need to clean up the backup location because an in progress `Backup` CR has not uploaded files to object storage.
-
-. Create a new `Backup` CR.
-
-. View the Velero backup details by running the following command:
-+
-[source,terminal, subs="+quotes"]
-----
-$ velero backup describe _<backup-name>_ --details
-----
-
-[id="backup-cr-remains-partiallyfailed_{context}"]
-== Backup CR status remains in PartiallyFailed
-
-The status of a `Backup` CR without Restic in use remains in the `PartiallyFailed` phase and is not completed. A snapshot of the affiliated PVC is not created.
-
-.Cause
-
-If the backup created based on the CSI snapshot class is missing a label, the CSI snapshot plugin fails to create a snapshot. As a result, the `Velero` pod logs an error similar to the following message:
-
-[source,text]
-----
-time="2023-02-17T16:33:13Z" level=error msg="Error backing up item" backup=openshift-adp/user1-backup-check5 error="error executing custom action (groupResource=persistentvolumeclaims, namespace=busy1, name=pvc1-user1): rpc error: code = Unknown desc = failed to get volumesnapshotclass for storageclass ocs-storagecluster-ceph-rbd: failed to get volumesnapshotclass for provisioner openshift-storage.rbd.csi.ceph.com, ensure that the desired volumesnapshot class has the velero.io/csi-volumesnapshot-class label" logSource="/remote-source/velero/app/pkg/backup/backup.go:417" name=busybox-79799557b5-vprq
-----
-
-.Solution
-
-. Delete the `Backup` CR by running the following command::
-+
-[source,terminal]
-----
-$ oc delete backups.velero.io <backup> -n openshift-adp
-----
-
-. If required, clean up the stored data on the `BackupStorageLocation` to free up space.
-
-. Apply the label `velero.io/csi-volumesnapshot-class=true` to the `VolumeSnapshotClass` object by running the following command:
-+
-[source,terminal]
-----
-$ oc label volumesnapshotclass/<snapclass_name> velero.io/csi-volumesnapshot-class=true
-----
-
-. Create a new `Backup` CR.
+include::modules/troubleshooting-backup-cr-status-remains-in-partiallyfailed-issue.adoc[leveloffset=+1]
diff --git a/...restore/troubleshooting/pods-crash-or-restart-due-to-lack-of-memory-or-cpu.adoc b/...restore/troubleshooting/pods-crash-or-restart-due-to-lack-of-memory-or-cpu.adoc
@@ -11,9 +11,8 @@ include::_attributes/attributes-openshift-dedicated.adoc[]
 
 toc::[]
 
-If a Velero or Restic pod crashes due to a lack of memory or CPU, you can set specific resource requests for either of those resources.
+If a Velero or Restic pod crashes due to a lack of memory or CPU, you can set specific resource requests for either of those resources. The values for the resource request fields must follow the same format as Kubernetes resource requirements.
 
-The values for the resource request fields must follow the same format as Kubernetes resource requirements.
 If you do not specify `configuration.velero.podConfig.resourceAllocations` or `configuration.restic.podConfig.resourceAllocations`, see the following default `resources` specification configuration for a Velero or Restic pod:
 
 [source,yaml]

diff --git a/...p_and_restore/application_backup_and_restore/troubleshooting/restic-issues.adoc b/...p_and_restore/application_backup_and_restore/troubleshooting/restic-issues.adoc
@@ -9,82 +9,14 @@ include::_attributes/attributes-openshift-dedicated.adoc[]
 
 toc::[]
 
-You might encounter these issues when you back up applications with Restic.
+You might encounter the following issues when you back up applications with Restic:
 
-[id="restic-permission-error-nfs-root-squash-enabled_{context}"]
-== Restic permission error for NFS data volumes with root_squash enabled
+* Restic permission error for NFS data volumes with `root_squash` enabled
+* Restic Backup CR cannot be recreated after bucket is emptied
+* Restic restore partially failing on OCP 4.14 due to changed PSA policy
 
-The `Restic` pod log displays the following error message: `controller=pod-volume-backup error="fork/exec/usr/bin/restic: permission denied"`.
+include::modules/restic-permission-error-for-nfs-data-volumes-with-root-squash-enabled.adoc[leveloffset=+1]
 
-.Cause
-
-If your NFS data volumes have `root_squash` enabled, `Restic` maps to `nfsnobody` and does not have permission to create backups.
-
-.Solution
-
-You can resolve this issue by creating a supplemental group for `Restic` and adding the group ID to the `DataProtectionApplication` manifest:
-
-. Create a supplemental group for `Restic` on the NFS data volume.
-. Set the `setgid` bit on the NFS directories so that group ownership is inherited.
-. Add the `spec.configuration.nodeAgent.supplementalGroups` parameter and the group ID to the `DataProtectionApplication` manifest, as shown in the following example:
-+
-[source,yaml]
-----
-apiVersion: oadp.openshift.io/v1alpha1
-kind: DataProtectionApplication
-# ...
-spec:
-  configuration:
-    nodeAgent:
-      enable: true
-      uploaderType: restic
-      supplementalGroups:
-      - <group_id> <1>
-# ...
-----
-<1> Specify the supplemental group ID.
-
-. Wait for the `Restic` pods to restart so that the changes are applied.
-
-[id="restic-backup-cannot-be-recreated-after-s3-bucket-emptied_{context}"]
-== Restic Backup CR cannot be recreated after bucket is emptied
-
-If you create a Restic `Backup` CR for a namespace, empty the object storage bucket, and then recreate the `Backup` CR for the same namespace, the recreated `Backup` CR fails.
-
-The `velero` pod log displays the following error message: `stderr=Fatal: unable to open config file: Stat: The specified key does not exist.\nIs there a repository at the following location?`.
-
-.Cause
-
-Velero does not recreate or update the Restic repository from the `ResticRepository` manifest if the Restic directories are deleted from object storage. See link:https://github.com/vmware-tanzu/velero/issues/4421[Velero issue 4421] for more information.
-
-.Solution
-
-* Remove the related Restic repository from the namespace by running the following command:
-+
-[source,terminal]
-----
-$ oc delete resticrepository openshift-adp <name_of_the_restic_repository>
-----
-+
-
-In the following error log, `mysql-persistent` is the problematic Restic repository. The name of the repository appears in italics for clarity.
-+
-[source,text,options="nowrap",subs="+quotes,verbatim"]
-----
- time="2021-12-29T18:29:14Z" level=info msg="1 errors
- encountered backup up item" backup=velero/backup65
- logSource="pkg/backup/backup.go:431" name=mysql-7d99fc949-qbkds
- time="2021-12-29T18:29:14Z" level=error msg="Error backing up item"
- backup=velero/backup65 error="pod volume backup failed: error running
- restic backup, stderr=Fatal: unable to open config file: Stat: The
- specified key does not exist.\nIs there a repository at the following
- location?\ns3:http://minio-minio.apps.mayap-oadp-
- veleo-1234.qe.devcluster.openshift.com/mayapvelerooadp2/velero1/
- restic/_mysql-persistent_\n: exit status 1" error.file="/remote-source/
- src/github.com/vmware-tanzu/velero/pkg/restic/backupper.go:184"
- error.function="github.com/vmware-tanzu/velero/
- pkg/restic.(*backupper).BackupPodVolumes"
- logSource="pkg/backup/backup.go:435" name=mysql-7d99fc949-qbkds
-----
+include::modules/restic-backup-cr-cannot-be-recreated-after-bucket-is-emptied.adoc[leveloffset=+1]
 
 include::modules/oadp-restic-restore-failing-psa-policy.adoc[leveloffset=+1]
diff --git a/modules/migration-debugging-velero-resources.adoc b/modules/migration-debugging-velero-resources.adoc
@@ -1,99 +1,87 @@
 // Module included in the following assemblies:
 //
-// * backup_and_restore/application_backup_and_restore/troubleshooting.adoc
+// * backup_and_restore/application_backup_and_restore/troubleshooting/velero-cli-tool.adoc
 // * migrating_from_ocp_3_to_4/troubleshooting-3-4.adoc
 // * migration_toolkit_for_containers/troubleshooting-mtc
 
 [id="migration-debugging-velero-resources_{context}"]
 = Debugging Velero resources with the Velero CLI tool
 
-You can debug `Backup` and `Restore` custom resources (CRs) and retrieve logs with the Velero CLI tool.
+You can debug `Backup` and `Restore` custom resources (CRs) and retrieve logs with the Velero CLI tool. The Velero CLI tool provides more detailed information than the OpenShift CLI tool.
 
-The Velero CLI tool provides more detailed information than the OpenShift CLI tool.
-
-[discrete]
-[id="velero-command-syntax_{context}"]
-== Syntax
-
-Use the `oc exec` command to run a Velero CLI command:
+.Procedure
 
+* Use the `oc exec` command to run a Velero CLI command:
++
 [source,terminal,subs="attributes+"]
 ----
 $ oc -n {namespace} exec deployment/velero -c velero -- ./velero \
   <backup_restore_cr> <command> <cr_name>
 ----
-
-.Example
++
+.Example for the `oc exec` command
 [source,terminal,subs="attributes+"]
 ----
 $ oc -n {namespace} exec deployment/velero -c velero -- ./velero \
   backup describe 0e44ae00-5dc3-11eb-9ca8-df7e5254778b-2d8ql
 ----
 
-[discrete]
-[id="velero-help-option_{context}"]
-== Help option
-
-Use the `velero --help` option to list all Velero CLI commands:
-
+* List all Velero CLI commands by using the following `velero --help` option:
++
 [source,terminal,subs="attributes+"]
 ----
 $ oc -n {namespace} exec deployment/velero -c velero -- ./velero \
   --help
 ----
 
-
-[discrete]
-[id="velero-describe-command_{context}"]
-== Describe command
-
-Use the `velero describe` command to retrieve a summary of warnings and errors associated with a `Backup` or `Restore` CR:
-
+* Retrieve the logs of a `Backup` or `Restore` CR by using the following `velero logs` command:
++
 [source,terminal,subs="attributes+"]
 ----
 $ oc -n {namespace} exec deployment/velero -c velero -- ./velero \
-  <backup_restore_cr> describe <cr_name>
+  <backup_restore_cr> logs <cr_name>
 ----
-
-.Example
++
+.Example for the `velero logs` command
 [source,terminal,subs="attributes+"]
 ----
 $ oc -n {namespace} exec deployment/velero -c velero -- ./velero \
-  backup describe 0e44ae00-5dc3-11eb-9ca8-df7e5254778b-2d8ql
+  restore logs ccc7c2d0-6017-11eb-afab-85d0007f5a19-x4lbf
 ----
 
-The following types of restore errors and warnings are shown in the output of a `velero describe` request:
-
-* `Velero`: A list of messages related to the operation of Velero itself, for example, messages related to connecting to the cloud, reading a backup file, and so on
-* `Cluster`: A list of messages related to backing up or restoring cluster-scoped resources
-* `Namespaces`: A list of list of messages related to backing up or restoring resources stored in namespaces
-
-One or more errors in one of these categories results in a `Restore` operation receiving the status of `PartiallyFailed` and not `Completed`. Warnings do not lead to a change in the completion status.
-
-[IMPORTANT]
-====
-* For resource-specific errors, that is, `Cluster` and `Namespaces` errors, the `restore describe --details` output includes a resource list that lists all resources that Velero succeeded in restoring. For any resource that has such an error, check to see if the resource is actually in the cluster.
-
-* If there are `Velero` errors, but no resource-specific errors, in the output of a `describe` command, it is possible that the restore completed without any actual problems in restoring workloads, but carefully validate post-restore applications.
+* Retrieve a summary of warnings and errors associated with a `Backup` or `Restore` CR by using the following `velero describe` command:
 +
-For example, if the output contains `PodVolumeRestore` or node agent-related errors, check the status of `PodVolumeRestores` and `DataDownloads`. If none of these are failed or still running, then volume data might have been fully restored.
-====
-
-[discrete]
-[id="velero-logs-command_{context}"]
-== Logs command
-
-Use the `velero logs` command to retrieve the logs of a `Backup` or `Restore` CR:
-
 [source,terminal,subs="attributes+"]
 ----
 $ oc -n {namespace} exec deployment/velero -c velero -- ./velero \
-  <backup_restore_cr> logs <cr_name>
+  <backup_restore_cr> describe <cr_name>
 ----
-
-.Example
++
+.Example for the `velero describe` command
 [source,terminal,subs="attributes+"]
 ----
 $ oc -n {namespace} exec deployment/velero -c velero -- ./velero \
-  restore logs ccc7c2d0-6017-11eb-afab-85d0007f5a19-x4lbf
+  backup describe 0e44ae00-5dc3-11eb-9ca8-df7e5254778b-2d8ql
 ----
++
+The following types of restore errors and warnings are shown in the output of a `velero describe` request:
++
+.`Velero`
+A list of messages related to the operation of Velero itself, for example, messages related to connecting to the cloud, reading a backup file, and so on
++
+.`Cluster`
+A list of messages related to backing up or restoring cluster-scoped resources
++
+.`Namespaces`
+A list of list of messages related to backing up or restoring resources stored in namespaces
+
++
+One or more errors in one of these categories results in a `Restore` operation receiving the status of `PartiallyFailed` and not `Completed`. Warnings do not lead to a change in the completion status.
++
+Consider the following points for these restore errors:
+
+* For resource-specific errors, that is, `Cluster` and `Namespaces` errors, the `restore describe --details` output includes a resource list that lists all resources that Velero succeeded in restoring. For any resource that has such an error, check if the resource is actually in the cluster.
+
+* If there are `Velero` errors, but no resource-specific errors, in the output of a `describe` command, it is possible that the restore completed without any actual problems in restoring workloads, but carefully validate post-restore applications.
++
+For example, if the output contains `PodVolumeRestore` or node agent-related errors, check the status of `PodVolumeRestores` and `DataDownloads`. If none of these are failed or still running, then volume data might have been fully restored.
diff --git a/modules/oadp-creating-alerting-rule.adoc b/modules/oadp-creating-alerting-rule.adoc
@@ -1,16 +1,16 @@
 // Module included in the following assemblies:
 //
-// * backup_and_restore/application_backup_and_restore/troubleshooting.adoc
+// * backup_and_restore/application_backup_and_restore/troubleshooting/oadp-monitoring.adoc
 
 :_mod-docs-content-type: PROCEDURE
 [id="creating-alerting-rules_{context}"]
 = Creating an alerting rule
 
-The {product-title} monitoring stack allows to receive Alerts configured using Alerting Rules. To create an Alerting rule for the OADP project, use one of the Metrics which are scraped with the user workload monitoring.
+The {product-title} monitoring stack allows to receive Alerts configured using Alerting Rules. To create an Alerting rule for the OADP project, use one of the Metrics, which are scraped with the user workload monitoring.
-The {product-title} monitoring stack allows to receive Alerts configured using Alerting Rules. To create an Alerting rule for the OADP project, use one of the Metrics, which are scraped with the user workload monitoring.
+The {product-title} monitoring stack allows to receive Alerts configured using Alerting Rules. To create an Alerting rule for the {oadp-short} project, use one of the Metrics scraped with the user workload monitoring.
-The {product-title} monitoring stack allows to receive Alerts configured using Alerting Rules. To create an Alerting rule for the OADP project, use one of the Metrics, which are scraped with the user workload monitoring.
+The {product-title} monitoring stack allows to receive Alerts configured using Alerting Rules. To create an Alerting rule for the {oadp-short} project, use one of the Metrics scraped with the user workload monitoring.
 
 .Procedure
 
-. Create a `PrometheusRule` YAML file with the sample `OADPBackupFailing` alert and save it as `4_create_oadp_alert_rule.yaml`.
+. Create a `PrometheusRule` YAML file with the sample `OADPBackupFailing` alert and save it as `4_create_oadp_alert_rule.yaml`:
 +
 .Sample `OADPBackupFailing` alert
 [source,yaml]
@@ -40,7 +40,7 @@ In this sample, the Alert displays under the following conditions:
 +
 * There is an increase of new failing backups during the 2 last hours that is greater than 0 and the state persists for at least 5 minutes.
 * If the time of the first increase is less than 5 minutes, the Alert will be in a `Pending` state, after which it will turn into a `Firing` state.
-+
+
 . Apply the `4_create_oadp_alert_rule.yaml` file, which creates the `PrometheusRule` object in the `openshift-adp` namespace:
 +
 [source,terminal]
@@ -55,12 +55,11 @@ prometheusrule.monitoring.coreos.com/sample-oadp-alert created
 ----
 
 .Verification
+
 * After the Alert is triggered, you can view it in the following ways:
 ** In the *Developer* perspective, select the *Observe* menu.
 ** In the *Administrator* perspective under the *Observe* -> *Alerting* menu, select *User* in the *Filter* box. Otherwise, by default only the *Platform* Alerts are displayed.
 +
 .OADP backup failing alert
 
-image::oadp-backup-failing-alert.png[OADP backup failing alert]
-
-
+image::oadp-backup-failing-alert.png[OADP backup failing alert]