Skip to content

[RBAC] Helm Chart 3.10.0: csi-rbdplugin container enters CrashLoopBackoff with failed to get node error #4306

Closed
@remisauvat

Description

Describe the bug

Hello,
After upgrading the helm chart from 3.9.0 to 3.10.0, the container csi-rbdplugin crashes in a loop with a permissions error to fetch the nodes resources.

F1206 16:31:23.921143       1 driver.go:131] failed to get node "xxxxxxxx" information: nodes "xxxxxxxx" is forbidden: User "system:serviceaccount:ceph-csi-rbd:cph-cs-rbd-ceph-csi-rbd-provisioner" cannot get resource "nodes" in API group "" at the cluster scope

This is probably related to changes from #4165, but the RBAC rules for the service account do not match the new requirement to fetch the nodes labels.

Environment details

  • Image/version of Ceph CSI driver : 3.10.0
  • Helm chart version : 3.10.0
  • Kernel version : 5.15.0-89-generic
  • Mounter used for mounting PVC (for cephFS its fuse or kernel. for rbd its
    krbd or rbd-nbd) :
  • Kubernetes cluster version : 1.27.3
  • Ceph cluster version :

Steps to reproduce

Steps to reproduce the behavior:

  1. Deploy helm chart ceph-csi/ceph-csi-rbd with version 3.10.0.
  2. Make sure rbac.create: true is set in values.yaml

Actual results

Pod csi-rbd-provisioner is in CrashLoopBackup state due to failure in csi-rbdplugin container

Expected behavior

The container and pod should not crash

Logs

csi-rbdplugin:

I1206 16:31:23.907599       1 cephcsi.go:191] Driver version: v3.10.0 and Git version: 24ae2a7a062b3e58746bb9cc6d5737e37a7e771c
I1206 16:31:23.907720       1 cephcsi.go:223] Starting driver type: rbd with name: rbd.csi.ceph.com
I1206 16:31:23.907750       1 driver.go:94] Enabling controller service capability: CREATE_DELETE_VOLUME
I1206 16:31:23.907755       1 driver.go:94] Enabling controller service capability: CREATE_DELETE_SNAPSHOT
I1206 16:31:23.907761       1 driver.go:94] Enabling controller service capability: CLONE_VOLUME
I1206 16:31:23.907765       1 driver.go:94] Enabling controller service capability: EXPAND_VOLUME
I1206 16:31:23.907770       1 driver.go:107] Enabling volume access mode: SINGLE_NODE_WRITER
I1206 16:31:23.907774       1 driver.go:107] Enabling volume access mode: MULTI_NODE_MULTI_WRITER
I1206 16:31:23.907778       1 driver.go:107] Enabling volume access mode: SINGLE_NODE_SINGLE_WRITER
I1206 16:31:23.907918       1 driver.go:107] Enabling volume access mode: SINGLE_NODE_MULTI_WRITER
F1206 16:31:23.921143       1 driver.go:131] failed to get node "xxxxxxx" information: nodes "xxxxxxx" is forbidden: User "system:serviceaccount:ceph-csi-rbd:cph-cs-rbd-ceph-csi-rbd-provisioner" cannot get resource "nodes" in API group "" at the cluster scope

Additional context

I don't think it's the same issue as #4298. Setting readAffinity.enabled to true or false doesn't change the issue.

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions