Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatically create rbac permissions flag for Prometheus receiver #3078

Open
paebersold-tyro opened this issue Jun 27, 2024 · 10 comments
Open
Labels
area:collector Issues for deploying collector area:rbac Issues relating to RBAC enhancement New feature or request

Comments

@paebersold-tyro
Copy link

Component(s)

collector

What happened?

Description

I am running the opentelemetry-operator with the --create-rbac-permissions flag set. When a new OpenTelemetryCollector resource is created (eg mode: daemonset) new pods are created and a new serviceaccount is created as well. However no new clusterroles or clusterrolebindings are created. This results in prometheus scrape errors due to lack of permissions for example. Eg

E0627 04:07:32.435836       1 reflector.go:147] k8s.io/client-go@v0.29.3/tools/cache/reflector.go:229: Failed to watch *v1.Pod: failed to list *v1.Pod: pods is forbidden: User "system:serviceaccount:observability:collector-with-ta-collector" cannot list resource "pods" in API group "" in the namespace "app-platform-monitoring"

No logs are generated on the operator-manager pod.

The clusterole that the operator manager is using has the access to create clusterroles/clusterrolebindings (I am deploying via the helm chart opentelemetry-operator version 0.62.0 (https://open-telemetry.github.io/opentelemetry-helm-charts)

Based on other issues raised previously it seems this flag was optional but now may no longer be required with the permissions being automatically granted based on existing access - I would like clarification on this aspect too please.

Steps to Reproduce

Run the opentelementry-operator with the create-rbac-permissions flag.

Expected Result

Clusterroles/bindings would be create when the new collector pods are created

Actual Result

No new roles/bindings created

Kubernetes Version

1.29

Operator version

0.102.0

Collector version

0.102.0

Environment information

Serviceaccount used by manager

% kubectl -n observability get pods otel-operator-opentelemetry-operator-dfb985c65-ngh9n -o yaml | grep serviceAccount   
  serviceAccount: opentelemetry-operator

Clusterrolebinding

% kubectl get clusterrolebinding -o wide | grep opentelemetry-operator
otel-operator-opentelemetry-operator-manager               ClusterRole/otel-operator-opentelemetry-operator-manager             6d 

clusterrole for the operator manager (generated via helm chart)

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  labels:
    app.kubernetes.io/component: controller-manager
    app.kubernetes.io/instance: otel-operator
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: opentelemetry-operator
    app.kubernetes.io/version: 0.102.0
    backstage.io/kubernetes-id: eyre-otel-operator
    helm.sh/chart: opentelemetry-operator-0.62.0
    tyro.cloud/source: eyre-otel-operator
    tyro.cloud/system: observability-platform
    tyroTaggingVersion: 3.0.0
    tyroTeam: observability
  name: otel-operator-opentelemetry-operator-manager
rules:
  - apiGroups:
      - ""
    resources:
      - configmaps
      - persistentvolumeclaims
      - persistentvolumes
      - pods
      - serviceaccounts
      - services
    verbs:
      - create
      - delete
      - get
      - list
      - patch
      - update
      - watch
  - apiGroups:
      - ""
    resources:
      - events
    verbs:
      - create
      - patch
  - apiGroups:
      - ""
    resources:
      - namespaces
    verbs:
      - list
      - watch
  - apiGroups:
      - apps
    resources:
      - daemonsets
      - deployments
      - statefulsets
    verbs:
      - create
      - delete
      - get
      - list
      - patch
      - update
      - watch
  - apiGroups:
      - apps
      - extensions
    resources:
      - replicasets
    verbs:
      - get
      - list
      - watch
  - apiGroups:
      - autoscaling
    resources:
      - horizontalpodautoscalers
    verbs:
      - create
      - delete
      - get
      - list
      - patch
      - update
      - watch
  - apiGroups:
      - rbac.authorization.k8s.io
    resources:
      - clusterroles
      - clusterrolebindings
    verbs:
      - create
      - delete
      - get
      - list
      - patch
      - update
      - watch
  - apiGroups:
      - ""
    resources:
      - nodes
      - namespaces
    verbs:
      - get
      - list
      - watch
  - apiGroups:
      - batch
    resources:
      - jobs
    verbs:
      - get
      - list
      - watch
  - apiGroups:
      - config.openshift.io
    resources:
      - infrastructures
      - infrastructures/status
    verbs:
      - get
      - list
      - watch
  - apiGroups:
      - coordination.k8s.io
    resources:
      - leases
    verbs:
      - create
      - get
      - list
      - update
  - apiGroups:
      - monitoring.coreos.com
    resources:
      - podmonitors
      - servicemonitors
    verbs:
      - create
      - delete
      - get
      - list
      - patch
      - update
      - watch
  - apiGroups:
      - networking.k8s.io
    resources:
      - ingresses
    verbs:
      - create
      - delete
      - get
      - list
      - patch
      - update
      - watch
  - apiGroups:
      - opentelemetry.io
    resources:
      - instrumentations
    verbs:
      - get
      - list
      - patch
      - update
      - watch
  - apiGroups:
      - opentelemetry.io
    resources:
      - opampbridges
    verbs:
      - create
      - delete
      - get
      - list
      - patch
      - update
      - watch
  - apiGroups:
      - opentelemetry.io
    resources:
      - opampbridges/finalizers
    verbs:
      - update
  - apiGroups:
      - opentelemetry.io
    resources:
      - opampbridges/status
    verbs:
      - get
      - patch
      - update
  - apiGroups:
      - opentelemetry.io
    resources:
      - opentelemetrycollectors
    verbs:
      - get
      - list
      - patch
      - update
      - watch
  - apiGroups:
      - opentelemetry.io
    resources:
      - opentelemetrycollectors/finalizers
    verbs:
      - get
      - patch
      - update
  - apiGroups:
      - opentelemetry.io
    resources:
      - opentelemetrycollectors/status
    verbs:
      - get
      - patch
      - update
  - apiGroups:
      - policy
    resources:
      - poddisruptionbudgets
    verbs:
      - create
      - delete
      - get
      - list
      - patch
      - update
      - watch
  - apiGroups:
      - route.openshift.io
    resources:
      - routes
      - routes/custom-host
    verbs:
      - create
      - delete
      - get
      - list
      - patch
      - update
      - watch

Log output

No response

Additional context

Pods created via manager..

collector-with-ta-collector-dn9q6                      1/1     Running     0          18m
collector-with-ta-collector-f8fm2                      1/1     Running     0          18m
collector-with-ta-collector-gh5dx                      1/1     Running     0          18m

Associated service account

NAME                              SECRETS   AGE
collector-with-ta-collector       0         18m

No clusterroles/etc associated

% kubectl get clusterrolebinding -o wide | grep collector-with-ta-collector 
% 
% date                        
Thu 27 Jun 2024 14:28:12 AEST
% kubectl get clusterrole | grep 2024-06-27
%
@paebersold-tyro paebersold-tyro added bug Something isn't working needs triage labels Jun 27, 2024
@jaronoff97
Copy link
Contributor

@iblancasa anything jumping out as problematic here?

@pavolloffay pavolloffay added the area:collector Issues for deploying collector label Jun 28, 2024
@pavolloffay
Copy link
Member

IIRC the --create-rbac-permissions does not create RBAC for TA/promethes.

@pavolloffay
Copy link
Member

however it would be great to support it

@paebersold-tyro
Copy link
Author

fyi for clarity my test setup did not use the target allocator (I'm aware the current helm charts require you to manually setup the target allocator RBAC resources). Apologies for the confusion in the naming. My sample config is below.

apiVersion: opentelemetry.io/v1beta1
kind: OpenTelemetryCollector
metadata:
  name: collector-with-ta
spec:
  mode: daemonset
  targetAllocator:
    enabled: false
  config:
    processors:
      batch: {}
    receivers:
      prometheus:
        config:
          scrape_configs:
          - job_name: test-pushgateway
            scrape_interval: 30s
            scrape_timeout: 10s
            honor_labels: true
            scheme: http
            kubernetes_sd_configs:
            - role: pod
              namespaces:
                names:
                - app-platform-monitoring
            relabel_configs:
            # and pod is running
            - source_labels: [__meta_kubernetes_pod_phase]
              regex: Running
              action: keep
            # and pod is ready
            - source_labels: [__meta_kubernetes_pod_ready]
              regex: true
              action: keep
            # and only metrics endpoints
            - source_labels: [__meta_kubernetes_pod_container_port_name]
              action: keep
              regex: metrics
    exporters:
      debug: []
    service:
      telemetry:
        logs:
          level: debug
      pipelines:
        metrics:
          receivers: [prometheus]
          processors: []
          exporters: [debug]

@fyuan1316
Copy link
Contributor

@paebersold-tyro
I am not sure whether to describe this as a bug or a feature request. However, I can definitely reproduce the issue. The root cause seems to be that the creation and management of RBAC for the component is done on a case-by-case basis, which will require a process to gradually provide support.

@pavolloffay
Copy link
Member

Correct, this should be an enhancement proposal to automate RBAC for the prometheus receiver.

@pavolloffay pavolloffay changed the title Using the create-rbac-permissions flag but clusterrole/clusterrolebindings are not being created for generated collector pods Support create-rbac-permissions flag for Prometheus receiver Jul 3, 2024
@pavolloffay
Copy link
Member

I have updated the title, please edit it if it does not match what is being asked here.

@pavolloffay pavolloffay added enhancement New feature or request and removed bug Something isn't working needs triage labels Jul 3, 2024
@paebersold-tyro
Copy link
Author

Thanks for the clarification on the issue and fine with the title update. Ideally it would be great to have a note on exactly what create-rbac-permissions gives you out of the box too.

@iblancasa
Copy link
Contributor

Actually, the title should be changed because the flag does nothing now.
https://github.com/open-telemetry/opentelemetry-operator/blob/main/main.go#L149

Now, we check if the operator has permissions to create RBAC resources and, if permissions are there, the operator will create the RBAC resources.

@pavolloffay pavolloffay changed the title Support create-rbac-permissions flag for Prometheus receiver Automatically create rbac permissions flag for Prometheus receiver Jul 8, 2024
@grandwizard28
Copy link

+1 on this if we have a solution

@jaronoff97 jaronoff97 added the area:rbac Issues relating to RBAC label Aug 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:collector Issues for deploying collector area:rbac Issues relating to RBAC enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

6 participants