[RBAC] Helm Chart 3.10.0: csi-rbdplugin container enters CrashLoopBackoff with failed to get node error #4306
Closed
Description
opened on Dec 6, 2023
Describe the bug
Hello,
After upgrading the helm chart from 3.9.0 to 3.10.0, the container csi-rbdplugin
crashes in a loop with a permissions error to fetch the nodes resources.
F1206 16:31:23.921143 1 driver.go:131] failed to get node "xxxxxxxx" information: nodes "xxxxxxxx" is forbidden: User "system:serviceaccount:ceph-csi-rbd:cph-cs-rbd-ceph-csi-rbd-provisioner" cannot get resource "nodes" in API group "" at the cluster scope
This is probably related to changes from #4165, but the RBAC rules for the service account do not match the new requirement to fetch the nodes labels.
Environment details
- Image/version of Ceph CSI driver : 3.10.0
- Helm chart version : 3.10.0
- Kernel version : 5.15.0-89-generic
- Mounter used for mounting PVC (for cephFS its
fuse
orkernel
. for rbd its
krbd
orrbd-nbd
) : - Kubernetes cluster version : 1.27.3
- Ceph cluster version :
Steps to reproduce
Steps to reproduce the behavior:
- Deploy helm chart ceph-csi/ceph-csi-rbd with version 3.10.0.
- Make sure
rbac.create: true
is set in values.yaml
Actual results
Pod csi-rbd-provisioner
is in CrashLoopBackup state due to failure in csi-rbdplugin
container
Expected behavior
The container and pod should not crash
Logs
csi-rbdplugin:
I1206 16:31:23.907599 1 cephcsi.go:191] Driver version: v3.10.0 and Git version: 24ae2a7a062b3e58746bb9cc6d5737e37a7e771c
I1206 16:31:23.907720 1 cephcsi.go:223] Starting driver type: rbd with name: rbd.csi.ceph.com
I1206 16:31:23.907750 1 driver.go:94] Enabling controller service capability: CREATE_DELETE_VOLUME
I1206 16:31:23.907755 1 driver.go:94] Enabling controller service capability: CREATE_DELETE_SNAPSHOT
I1206 16:31:23.907761 1 driver.go:94] Enabling controller service capability: CLONE_VOLUME
I1206 16:31:23.907765 1 driver.go:94] Enabling controller service capability: EXPAND_VOLUME
I1206 16:31:23.907770 1 driver.go:107] Enabling volume access mode: SINGLE_NODE_WRITER
I1206 16:31:23.907774 1 driver.go:107] Enabling volume access mode: MULTI_NODE_MULTI_WRITER
I1206 16:31:23.907778 1 driver.go:107] Enabling volume access mode: SINGLE_NODE_SINGLE_WRITER
I1206 16:31:23.907918 1 driver.go:107] Enabling volume access mode: SINGLE_NODE_MULTI_WRITER
F1206 16:31:23.921143 1 driver.go:131] failed to get node "xxxxxxx" information: nodes "xxxxxxx" is forbidden: User "system:serviceaccount:ceph-csi-rbd:cph-cs-rbd-ceph-csi-rbd-provisioner" cannot get resource "nodes" in API group "" at the cluster scope
Additional context
I don't think it's the same issue as #4298. Setting readAffinity.enabled
to true
or false
doesn't change the issue.
Metadata
Assignees
Labels
No labels
Activity