Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check for PVC label selector conflicts against the secondary cluster #1855

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

GowthamShanmugam
Copy link

@GowthamShanmugam GowthamShanmugam commented Feb 20, 2025

Applying PVC Label Conflict Check on the Secondary Cluster

The PR introduces a mechanism in vrg_volrep.go and vrg_volsync.go that applies the PVCLabelSelector from the secondary VRG and checks whether any PVCs are selected.

Expected behavior:

  • A secondary VRG should not match any PVCs using the PVCLabelSelector.
  • If a PVC is selected on the secondary cluster, it indicates a conflict because the label selector should not apply to unintended PVCs after failover/relocation.
  • In case of a conflict, the reconciliation process will be required, and the VRG will log the issue.
  • For VolSync, the behavior is slightly different, if the label selector matches any PVC that is not part of RDSpec, it is considered a conflict.

For more: #1861

@GowthamShanmugam GowthamShanmugam changed the title Check for PVC label selector conflicts against the secondary cluster. Check for PVC label selector conflicts against the secondary cluster. - WIP(Need to test) Feb 20, 2025
@GowthamShanmugam GowthamShanmugam force-pushed the secondary_pvc_conflict_check branch from d8f3da4 to 92a2325 Compare February 20, 2025 20:10
@nirs
Copy link
Member

nirs commented Feb 21, 2025

How do you plan to do the conflict check?

Labels can be added and removed at any time. Lets say we have a conflict check the time a drpc is added, there are no conflict.

Now the user add a label or add a new pvc machine another drpc label. The next time we look at the pvcs we will find a new pvc and protect it. How the conflict check helps?

The example you using Exist for label is a very bad way to select pvcs. The only way to protect from such bad label selector is to reject Exists selector and allow only In selector.

@GowthamShanmugam
Copy link
Author

How do you plan to do the conflict check?

Labels can be added and removed at any time. Lets say we have a conflict check the time a drpc is added, there are no conflict.

The conflict check is implemented within the VRG reconciliation process, which already watching PVCs for the addition of the VRG owner label. Therefore, updating labels on PVCs does not pose an issue.

Now the user add a label or add a new pvc machine another drpc label. The next time we look at the pvcs we will find a new pvc and protect it. How the conflict check helps?

I think you are explaining primary cluster behavior,

The conflict check applies specifically to the secondary cluster. If any PVC in the secondary cluster matches the primary VRG’s label selector, the check will trigger an error. It does not consider whether the PVC is owned by another VRG or remains unowned.

The example you using Exist for label is a very bad way to select pvcs. The only way to protect from such bad label selector is to reject Exists selector and allow only In selector.

The above example provides a straightforward, high-level overview to highlight the importance of performing a PVC conflict check in the secondary cluster. Since a PVC can have multiple labels, conflicts can arise in various ways.

@nirs
Copy link
Member

nirs commented Feb 21, 2025

The conflict check applies specifically to the secondary cluster. If any PVC in the secondary cluster matches the primary VRG’s label selector, the check will trigger an error. It does not consider whether the PVC is owned by another VRG or remains unowned.

The flow is not clear. Can yo describe the state in both clusters and explain when the conflicts check happens, and what is expected result of the conflict check? Does it only log an error, or prevent failover/relocate?

The behavior must be specified and discussed before we working on a solution. An issue is the right place to describe the problem and specify the solution.

@GowthamShanmugam
Copy link
Author

GowthamShanmugam commented Feb 21, 2025

The conflict check applies specifically to the secondary cluster. If any PVC in the secondary cluster matches the primary VRG’s label selector, the check will trigger an error. It does not consider whether the PVC is owned by another VRG or remains unowned.

The flow is not clear. Can yo describe the state in both clusters and explain when the conflicts check happens, and what is expected result of the conflict check? Does it only log an error, or prevent failover/relocate?

The behavior must be specified and discussed before we working on a solution. An issue is the right place to describe the problem and specify the solution.

This PR is part of the independent VM feature, and I have already discussed the issue with my sub-team. I raised the PR to facilitate discussions with other sub-teams, as I felt continuing the discussion under an issue would delay implementation.

I still need help deciding whether to simply log the issue, return an error in the secondary VRG reconciliation, or update a status condition.

@GowthamShanmugam GowthamShanmugam force-pushed the secondary_pvc_conflict_check branch from 92a2325 to 05da967 Compare February 21, 2025 11:34
@nirs
Copy link
Member

nirs commented Feb 21, 2025

This PR is part of the independent VM feature, and I have already discussed the issue with my sub-team. I raised the PR to facilitate discussions with other sub-teams, as I felt continuing the discussion under an issue would delay implementation.

There is no way to discuss this PR with other people if the problem and the suggested behavior is not specified in public.

@GowthamShanmugam
Copy link
Author

This PR is part of the independent VM feature, and I have already discussed the issue with my sub-team. I raised the PR to facilitate discussions with other sub-teams, as I felt continuing the discussion under an issue would delay implementation.

There is no way to discuss this PR with other people if the problem and the suggested behavior is not specified in public.

I still don’t understand why presenting the problem along with a clear example isn't sufficient for the PR review. While we can continue refining the solution by creating a well-documented issue, drafting extensive documentation, and holding multiple meetings, this approach will only prolong the process. The purpose of the PR is to initiate discussions and collaboratively work toward a solution rather than delaying implementation with excessive formalities.

@GowthamShanmugam GowthamShanmugam force-pushed the secondary_pvc_conflict_check branch 4 times, most recently from f6057dd to f11a4e4 Compare February 28, 2025 08:31
@GowthamShanmugam
Copy link
Author

GowthamShanmugam commented Feb 28, 2025

Volsync Secondary:

status:
  conditions:
    - lastTransitionTime: '2025-02-28T11:05:28Z'
      message: 'Volsync based PVC protection does not report DataReady condition as Secondary. PVC protection as secondary is complete, or no PVCs needed protection using VolumeReplication scheme'
      observedGeneration: 8
      reason: Unused
      status: 'True'
      type: DataReady
    - lastTransitionTime: '2025-02-28T11:05:28Z'
      message: A PVC that is not a replication destination should not match the label selector.
      observedGeneration: 8
      reason: PVCConflict
      status: 'False'
      type: DataProtected
    - lastTransitionTime: '2025-02-28T11:05:28Z'
      message: A PVC that is not a replication destination should not match the label selector.
      observedGeneration: 8
      reason: PVCConflict
      status: 'False'
      type: ClusterDataProtected
  kubeObjectProtection: {}
  lastUpdateTime: '2025-02-28T11:05:28Z'
  observedGeneration: 8
  state: Secondary

VolRep Secondary:

status:
  conditions:
    - lastTransitionTime: '2025-02-28T08:02:43Z'
      message: PVCs in the VolumeReplicationGroup group are replicating
      observedGeneration: 3
      reason: Replicating
      status: 'True'
      type: DataReady
    - lastTransitionTime: '2025-02-28T10:18:20Z'
      message: No PVC on the secondary should match the label selector
      observedGeneration: 3
      reason: PVCConflict
      status: 'False'
      type: DataProtected
    - lastTransitionTime: '2025-02-28T06:02:47Z'
      message: Restored 0 volsync PVs/PVCs and 2 volrep PVs/PVCs
      observedGeneration: 2
      reason: Restored
      status: 'True'
      type: ClusterDataReady
    - lastTransitionTime: '2025-02-28T08:00:51Z'
      message: No PVCs are protected using Volsync scheme. Cluster data is not protected as Secondary
      observedGeneration: 3
      reason: Unused
      status: 'True'
      type: ClusterDataProtected
    - lastTransitionTime: '2025-02-28T06:02:47Z'
      message: Kube objects restored
      observedGeneration: 2
      reason: KubeObjectsRestored
      status: 'True'
      type: KubeObjectsReady
  kubeObjectProtection: {}
  lastUpdateTime: '2025-02-28T10:18:20Z'
  observedGeneration: 3
  protectedPVCs:
    - resources:
        requests:
          storage: 1Gi
      replicationID:
        id: 0c7a26e15564eb07d56cda451e0f42e5
        modes:
          - Failover
      storageClassName: ocs-storagecluster-ceph-rbd
      csiProvisioner: openshift-storage.rbd.csi.ceph.com
      name: busybox-pvc
      accessModes:
        - ReadWriteOnce
      storageID:
        id: 92472e3f8bd240a3e24e6faed855071d
      conditions:
        - lastTransitionTime: '2025-02-28T08:02:43Z'
          message: VolumeReplication resource for the pvc is syncing as Secondary
          observedGeneration: 3
          reason: Replicating
          status: 'True'
          type: DataReady
        - lastTransitionTime: '2025-02-28T06:02:59Z'
          message: PV cluster data already protected for PVC busybox-pvc
          observedGeneration: 2
          reason: Uploaded
          status: 'True'
          type: ClusterDataProtected
        - lastTransitionTime: '2025-02-28T08:00:57Z'
          message: VolumeReplication resource for the pvc is syncing as Secondary
          observedGeneration: 3
          reason: Replicating
          status: 'False'
          type: DataProtected
      namespace: busybox-tim1
      volumeMode: Filesystem
      labels:
        app: sapptim1
        app.kubernetes.io/part-of: sapptim1
        appname: busybox
        apps.open-cluster-management.io/reconcile-rate: medium
  state: Secondary

@GowthamShanmugam GowthamShanmugam changed the title Check for PVC label selector conflicts against the secondary cluster. - WIP(Need to test) Check for PVC label selector conflicts against the secondary cluster Feb 28, 2025
for _, pvc := range v.volSyncPVCs {
matchFound := false

for _, rdSpec := range v.instance.Spec.VolSync.RDSpec {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need @BenamarMk to check if this is going to an correct check all the time. There is a certain duration when the vrg is secondary and doesn't have the rdspec filled it. It might lead to spurious errors.

@@ -2730,3 +2719,39 @@ func PruneAnnotations(annotations map[string]string) map[string]string {

return result
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no conflict checking function for volrep?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Signed-off-by: Gowtham Shanmugasundaram <gshanmug@redhat.com>
@GowthamShanmugam GowthamShanmugam force-pushed the secondary_pvc_conflict_check branch from f11a4e4 to 72e109b Compare March 5, 2025 19:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants