Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upstep v0.15.0 breaks kubernetes deployments due to user nobody switch #703

Closed
boeboe opened this issue Oct 16, 2017 · 7 comments
Closed

Comments

@boeboe
Copy link

boeboe commented Oct 16, 2017

On request of @mdlayher ... a tracking issue.

Background: we have been running node_eporter in our Kubernetes cluster so far without problems, being up until 0.14.0 (where it was still running as root as defined in the Dockerfile). For full traceability, the working daemonset yaml config up until now.

apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
  name: prometheus-node-exporter
  labels:
    name: prometheus-node-exporter
spec:
  template:
    metadata:
      labels:
        name: prometheus-node-exporter
      annotations:
         prometheus.io/scrape: "true"
         prometheus.io/port: "9100"
    spec:
      serviceAccountName: cluster-reader
      hostPID: true
      hostIPC: true
      hostNetwork: true
      containers:
        - ports:
            - containerPort: 9100
              name: node-exporter
              protocol: TCP
          resources:
            requests:
              cpu: 10m
              memory: 32Mi
            limits:
              cpu: 100m
              memory: 64Mi
          securityContext:
              privileged: true
          image: registry.bfed-pro.intapp.eu/prom/node-exporter:v0.14.0
          args:
            - -collector.procfs
            - /host/proc
            - -collector.sysfs
            - /host/sys
            - -collector.filesystem.ignored-mount-points
            - '"^/(sys|proc|dev|host|etc)($|/)"'
          name: prometheus-node-exporter
          volumeMounts:
            - name: dev
              mountPath: /host/dev
            - name: proc
              mountPath: /host/proc
            - name: sys
              mountPath: /host/sys
            - name: rootfs
              mountPath: /rootfs
      volumes:
        - name: proc
          hostPath:
            path: /proc
        - name: dev
          hostPath:
            path: /dev
        - name: sys
          hostPath:
            path: /sys
        - name: rootfs
          hostPath:
            path: /

Once we did the upstep to 0.15.0 (taking into account the new way to -- prepend collector options), our deployment gave the following errors.

time="2017-10-14T13:16:57Z" level=error msg="Error on statfs() system call for \"/rootfs/var/lib/kubelet/pods/5f702f30-9c31-11e7-a39d-005056866643/volumes/kubernetes.io~secret/default-token-141nj\": permission denied" source="filesystem_linux.go:57"
time="2017-10-14T13:16:57Z" level=error msg="Error on statfs() system call for \"/rootfs/var/lib/docker-latest/containers/13436acac0fed00bc9cdaf648b83305320a64c9a97366825b5b16c695f82af5c/shm\": permission denied" source="filesystem_linux.go:57"

We tried using the ignore-mount-points config flag for filesystem collector as in...

  args:
    - --path.procfs
    - /host/proc
    - --path.sysfs
    - /host/sys
    - --collector.filesystem.ignored-mount-points
    - '"^/(sys|proc|dev|host|etc|rootfs\/var\/lib\/kubelet\/pods|rootfs\/var\/lib\/docker-latest\/containers|)($|/)"'
  volumeMounts:

... but this resulted in the same issues. Eventually we went for overwriting the nobody user back again with root as in full config below (runAsUser being the key fix).

apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
  name: prometheus-node-exporter
  labels:
    name: prometheus-node-exporter
spec:
  updateStrategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 50%
  template:
    metadata:
      labels:
        name: prometheus-node-exporter
      annotations:
         prometheus.io/scrape: "true"
         prometheus.io/port: "9100"
    spec:
      serviceAccountName: cluster-reader
      hostPID: true
      hostIPC: true
      hostNetwork: true
      securityContext:
        runAsUser: 0
      containers:
        - name: prometheus-node-exporter
          image: registry.bfed-pro.intapp.eu/prom/node-exporter:v0.15.0
          ports:
            - containerPort: 9100
              name: node-exporter
              protocol: TCP
          resources:
            requests:
              cpu: 10m
              memory: 32Mi
            limits:
              cpu: 100m
              memory: 64Mi
          securityContext:
            privileged: true
          args:
            - --path.procfs
            - /host/proc
            - --path.sysfs
            - /host/sys
            - --collector.filesystem.ignored-mount-points
            - '"^/(sys|proc|dev|host|etc|rootfs)($|/)"'
          volumeMounts:
            - name: dev
              mountPath: /host/dev
            - name: proc
              mountPath: /host/proc
            - name: sys
              mountPath: /host/sys
            - name: rootfs
              mountPath: /rootfs
      volumes:
        - name: proc
          hostPath:
            path: /proc
        - name: dev
          hostPath:
            path: /dev
        - name: sys
          hostPath:
            path: /sys
        - name: rootfs
          hostPath:
            path: /

The problem is that the paths on the node (/rootfs/var/lib/kubelet and /rootfs/var/lib/docker-latest) are not readable by user nobody and we don't think it's a good idea to make them readable all by other users (third digit in linux permission map).

REF to original discussion: #599

@discordianfish
Copy link
Member

So you can get around this by ignoring these mountpoints. There is just something wrong with your regex. I've got it working here with this regex:

      - image:  quay.io/prometheus/node-exporter:v0.15.0
        args:
          - --collector.filesystem.ignored-mount-points
          - '^\/rootfs\/(var\/lib|run\/docker|home\/kubernetes)\/.*'

We might want to consider ignoring unreadable mountpoints but not sure about this. @SuperQ Thoughts?

@fvigotti
Copy link

fvigotti commented Nov 8, 2017

@discordianfish for me neither your regexp worked completely
if someone is interested this is my version

          - "--collector.filesystem.ignored-mount-points"
          - '^(\/rootfs\/var\/lib\/|\/rootfs\/run\/docker\/|\/var\/run\/docker\/netns\/|\/(host|root)\/sys\/kernel\/debug\/).*'

@nelsonfassis
Copy link

I'm having the same issue and it doesn't make much sense, as if I exec into that container, I can read my volumes, I can see it using DF and DU, but I also have those permission denied on my container logs and can't see storage metrics passed to Prometheus.

level=error msg="Error on statfs() system call for "/root-disk/run/docker/netns/default": permission denied" source="filesystem_linux.go:57"
level=error msg="Error on statfs() system call for "/root-disk/run/docker/netns/39aefbf3629c": permission denied"

I can see I can't access root-disk/run/docker from it, its 700 on my host...

Any update on this problem?

@discordianfish
Copy link
Member

@nelsonfassis Are you sure you don't see this with df?
df -h 2>&1|grep netns running inside my node-exporter container also shows a permission denied.
Just ignore the mountpoint like @fvigotti described.

So nothing wrong with the node-exporter. Wondering if there is something we can do to improve the experience. We could just silently skip unreadable mountpoint or skip cgroup mountpoints in the collector.. @SuperQ thoughts?

@SuperQ
Copy link
Member

SuperQ commented Dec 21, 2017

We could reduce the message to Debug level. We may also want to add cgroup to the ignore list if that's a problem.

@nelsonfassis
Copy link

@discordianfish My mistake. I didn't skip any mount points. I would suggest to not read anything but / by default, and an option to add what mounts do you want to check. Would be good for my use case at least.

Now that I skipped those folders, everything seems to be working fine for me with node-exporter. Struggling with alertmanager now.

@discordianfish
Copy link
Member

Going to close this. Let's discuss in #66 the general "UX" of running then node-exporter in container environments.

openstack-mirroring pushed a commit to openstack/openstack-helm-infra that referenced this issue Jun 2, 2020
This change adds the ability to configure the

--collector.filesystem.ignored-mount-points

parameter, which is useful in events where a subdirectory
cannot be statfs'd by a non-root user.

Change-Id: Ie2be8c496aa676e9a3fee5434e0c194615f9cdab
See: prometheus/node_exporter#703
openstack-mirroring pushed a commit to openstack/openstack that referenced this issue Jun 2, 2020
* Update openstack-helm-infra from branch 'master'
  - Merge "Node Exporter: Allow Ignored Mountpoints"
  - Node Exporter: Allow Ignored Mountpoints
    
    This change adds the ability to configure the
    
    --collector.filesystem.ignored-mount-points
    
    parameter, which is useful in events where a subdirectory
    cannot be statfs'd by a non-root user.
    
    Change-Id: Ie2be8c496aa676e9a3fee5434e0c194615f9cdab
    See: prometheus/node_exporter#703
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants