Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rbd: add immediate topology flag #4790

Merged
merged 2 commits into from
Aug 21, 2024

Conversation

iPraveenParihar
Copy link
Contributor

Describe what this PR does

In csi-external-provisioner version 5.0.1, topology-aware provisioning is enabled by default. As a result, the provisioner now expects topologyKeys to be present in the CSINode object, which must be provided by the user via the --domainlabels flag in the RBD nodeplugin.

Issue: Users upgrading to version 3.12.0 who were not previously using topology-aware provisioning may encounter issues when provisioning RBD PVCs, as the --domainlabels flag might not be set.

Fix: To address this, add --immediate-topology=false to disable topology-aware provisioning. Users requiring topology-aware provisioning should provide the volumeBindingMode as WaitForFirstConsumer and TopologyConstrainedPools in the StorageClass and configure the --domainlabels flag in the RBD nodeplugin.

Fixes: #4777

Future concerns

List items that are not part of the PR and do not impact it's
functionality, but are work items that can be taken up subsequently.

Checklist:

  • Commit Message Formatting: Commit titles and messages follow guidelines in the developer guide.
  • Reviewed the developer guide on Submitting a Pull Request
  • Pending release notes updated with breaking and/or notable changes for the next major release.
  • Documentation has been updated, if necessary.
  • Unit tests have been added, if necessary.
  • Integration tests have been added, if necessary.

Show available bot commands

These commands are normally not required, but in case of issues, leave any of
the following bot commands in an otherwise empty comment in this PR:

  • /retest ci/centos/<job-name>: retest the <job-name> after unrelated
    failure (please report the failure too!)

In csi-external-provisioner: v5.0.1, topology-aware
provisioning is enabled by default. As a result provisioner
now expects toologyKeys to be present in CSINode object which
must be passed by user via `--domainlabels` flag in RBD nodeplugin.

Issue: Users upgrading to v3.12.0 who were not previously using
topology-aware provisioning may encounter issues when provisionining
RBD PVCs, as the `--domainlabels` flag might not be set.

Fix: To address this, add `--immediate-topology=false` to disable
topology-aware provisioning. User requiring topology-aware
provisioning should provided the volumeBindingMode as
`WaitForFirstConsumer` and `TopologyConstrainedPools` as required in
the StorageClass and configure `--domainlabels` flag in RBD nodeplugin.

Signed-off-by: Praveen M <m.praveen@ibm.com>
@iPraveenParihar iPraveenParihar self-assigned this Aug 21, 2024
@mergify mergify bot added the component/rbd Issues related to RBD label Aug 21, 2024
@iPraveenParihar
Copy link
Contributor Author

/test ci/centos/mini-e2e-helm/k8s-1.30

@iPraveenParihar iPraveenParihar marked this pull request as ready for review August 21, 2024 08:40
Signed-off-by: Praveen M <m.praveen@ibm.com>
@iPraveenParihar iPraveenParihar force-pushed the rbd/immediate-topology-flag branch from d629068 to 2689977 Compare August 21, 2024 09:03
@iPraveenParihar iPraveenParihar added the backport-to-release-v3.12 Label to backport from devel to release-v3.12 branch label Aug 21, 2024
@iPraveenParihar
Copy link
Contributor Author

@Mergifyio queue

Copy link
Contributor

mergify bot commented Aug 21, 2024

queue

🛑 The pull request has been removed from the queue default

The merge conditions cannot be satisfied due to failing checks.

You can take a look at Queue: Embarked in merge queue check runs for more details.

In case of a failure due to a flaky test, you should first retrigger the CI.
Then, re-embark the pull request into the merge queue by posting the comment
@mergifyio refresh on the pull request.

@mergify mergify bot added the ok-to-test Label to trigger E2E tests label Aug 21, 2024
@ceph-csi-bot
Copy link
Collaborator

/test ci/centos/upgrade-tests-cephfs

@ceph-csi-bot
Copy link
Collaborator

/test ci/centos/upgrade-tests-rbd

@ceph-csi-bot
Copy link
Collaborator

/test ci/centos/k8s-e2e-external-storage/1.31

@ceph-csi-bot
Copy link
Collaborator

/test ci/centos/mini-e2e-helm/k8s-1.31

@ceph-csi-bot
Copy link
Collaborator

/test ci/centos/mini-e2e/k8s-1.31

@ceph-csi-bot
Copy link
Collaborator

/test ci/centos/k8s-e2e-external-storage/1.30

@ceph-csi-bot
Copy link
Collaborator

/test ci/centos/k8s-e2e-external-storage/1.29

@ceph-csi-bot
Copy link
Collaborator

/test ci/centos/mini-e2e-helm/k8s-1.30

@ceph-csi-bot
Copy link
Collaborator

/test ci/centos/mini-e2e-helm/k8s-1.29

@ceph-csi-bot
Copy link
Collaborator

/test ci/centos/mini-e2e/k8s-1.30

@ceph-csi-bot
Copy link
Collaborator

/test ci/centos/mini-e2e/k8s-1.29

@ceph-csi-bot ceph-csi-bot removed the ok-to-test Label to trigger E2E tests label Aug 21, 2024
@Madhu-1
Copy link
Collaborator

Madhu-1 commented Aug 21, 2024

/test ci/centos/mini-e2e-helm/k8s-1.30

@Madhu-1
Copy link
Collaborator

Madhu-1 commented Aug 21, 2024

/test ci/centos/mini-e2e/k8s-1.30

@Madhu-1
Copy link
Collaborator

Madhu-1 commented Aug 21, 2024

/test ci/centos/upgrade-tests-rbd

@mergify mergify bot merged commit 9181d20 into ceph:devel Aug 21, 2024
41 checks passed
@dragoangel
Copy link
Contributor

dragoangel commented Aug 22, 2024

Looks like --immediate-topology=false not solve things like --feature-gates=Topology=false do, because of errors in csi-provisioner container:

W0822 16:26:29.416084       1 reflector.go:547] k8s.io/client-go/informers/factory.go:160: failed to list *v1.Node: nodes is forbidden: User "system:serviceaccount:ceph-csi-rbd:ceph-csi-rbd-provisioner" cannot list resource "nodes" in API group "" at the cluster scope
E0822 16:26:29.416289       1 reflector.go:150] k8s.io/client-go/informers/factory.go:160: Failed to watch *v1.Node: failed to list *v1.Node: nodes is forbidden: User "system:serviceaccount:ceph-csi-rbd:ceph-csi-rbd-provisioner" cannot list resource "nodes" in API group "" at the cluster scope

It still tries to list node resources, even if they would not be used by csi-provisioner.

If I add to ClusterRole ceph-csi-rbd-provisioner:

+     resources: ["nodes"]
+     verbs: ["get", "list","watch"]
+   - apiGroups: ["storage.k8s.io"]
+     resources: ["csinodes"]
+     verbs: ["get", "list", "watch"]
+   - apiGroups: [""]

which added by default when you set topology.domainLabels - then everything is working. If I would set topology.domainLabels to properly fix rbac - well, nodeplugin will handg with:

missing domain labels [topology.kubernetes.io/region topology.kubernetes.io/zone] on node "node-X"

@Madhu-1 @iPraveenParihar Looks like we need adjust helm chart not always add this permissions for ClusterRole ceph-csi-rbd-provisioner.

@iPraveenParihar
Copy link
Contributor Author

@dragoangel, Thanks for reporting.

seems like while I was testing helm install script, older clusterroles were not cleaned up properly 😞 and error didn't show up.

@dragoangel
Copy link
Contributor

@dragoangel, Thanks for reporting.

seems like while I was testing helm install script, older clusterroles were not cleaned up properly 😞 and error didn't show up.

Hi @iPraveenParihar assume you had rbac create false, otherwise helm would throw error like you are trying to own resources not owned by this helm chart?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-to-release-v3.12 Label to backport from devel to release-v3.12 branch component/rbd Issues related to RBD
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ceph-csi 3.12.0 not able to provision RBD PVC.
5 participants