Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-Node is not producing exact replica nodes (getting file write permission errors on extra node) #9305

Closed
harryttd opened this issue Sep 22, 2020 · 14 comments
Labels
co/multinode Issues related to multinode clusters kind/bug Categorizes issue or PR as related to a bug. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.

Comments

@harryttd
Copy link

harryttd commented Sep 22, 2020

Steps to reproduce the issue:

  1. minikube start --nodes 2
  2. kubectl apply -f /path/to/yaml (yaml pasted below)
  3. Confirm pods are on separate nodes: k get pods -n myapp -o jsonpath='{.items[*].spec.nodeName}'
  4. Check logs of pod on node minkube. Should say Created dir /var/foo/bar.
  5. Check logs of pod on node minikube-m02. Should say mkdir: can't create directory '/var/foo/bar': Permission denied.

Full output of minikube start command used, if not already included:

❯ minikube start --nodes 2
😄  minikube v1.13.1 on Darwin 10.15.6
✨  Automatically selected the hyperkit driver
👍  Starting control plane node minikube in cluster minikube
🔥  Creating hyperkit VM (CPUs=2, Memory=6000MB, Disk=20000MB) ...
🐳  Preparing Kubernetes v1.19.2 on Docker 19.03.12 ...
🔎  Verifying Kubernetes components...
🌟  Enabled addons: default-storageclass, storage-provisioner

❗  Multi-node clusters are currently experimental and might exhibit unintended behavior.
📘  To track progress on multi-node clusters, see https://github.com/kubernetes/minikube/issues/7538.

👍  Starting node minikube-m02 in cluster minikube
🔥  Creating hyperkit VM (CPUs=2, Memory=6000MB, Disk=20000MB) ...
🌐  Found network options:
    ▪ NO_PROXY=192.168.64.31
🐳  Preparing Kubernetes v1.19.2 on Docker 19.03.12 ...
    ▪ env NO_PROXY=192.168.64.31
🔎  Verifying Kubernetes components...
🏄  Done! kubectl is now configured to use "minikube" by default

yaml for reproducing issue:

My use case's image sets the uid of the container to 100 so mimicking here for reproduction. Running the container as root does not result in any issues.

apiVersion: v1
kind: Namespace
metadata:
  name: myapp
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: myapp-pvc
  namespace: myapp
spec:
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 1Gi
  selector:
    matchLabels:
      storage-type: var-files
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp
  namespace: myapp
spec:
  replicas: 4
  selector:
    matchLabels:
      app: myapp
  template:
    metadata:
      labels:
        app: myapp
    spec:
      containers:
        - name: myapp
          image: busybox
          command:
            - /bin/sh
          args:
            - -c
            - |
              mkdir -p /var/foo/bar && echo Created dir /var/foo/bar
              # Keep container running
              until false; do sleep 2; done;
          securityContext:
            # My use case's image sets the uid to 100
            runAsUser: 100
          volumeMounts:
            - mountPath: /var/foo
              name: var-volume
      volumes:
        - name: var-volume
          persistentVolumeClaim:
            claimName: myapp-pvc
@sharifelgamal sharifelgamal added co/multinode Issues related to multinode clusters kind/bug Categorizes issue or PR as related to a bug. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. labels Sep 22, 2020
@tstromberg
Copy link
Contributor

Very strange - I wonder if this maybe due to both nodes sharing a common volume? It doesn't seem like we should be treating the partition any differently.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 27, 2020
@skyhisi
Copy link

skyhisi commented Jan 20, 2021

I think I'm seeing the same issue, it looks like on the master node, the directories created in /tmp/hostpath-provisioner/default have the permissions 777 (rwxrwxrwx), whereas on the secondary node, the permissions are 755 (rwxr-xr-x).

@prezha
Copy link
Contributor

prezha commented Feb 18, 2021

@harryttd , @skyhisi i cannot reproduce the issue - it was probably fixed in the later minikube version (tested on current v1.17.1)

@sharifelgamal
Copy link
Collaborator

I'll close this for now, please reopen if you're still seeing an issue with latest minikube.

@slavonicsniper
Copy link

minikube v1.17.1 on Fedora 32
driver: docker

I could still reproduce this on the latest minikube when deploying artifactory-jcr.

Deployment fails with mkdir: cannot create directory ‘/bitnami/postgresql/data’: Permission denied when running minikube with multiple nodes. As you can see down below, the permission is 755 on the other nodes besides the master minikube node.

k get pvc
NAME                                               STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
artifactory-volume-artifactory-jcr-artifactory-0   Bound    pvc-9024c91e-0477-41fc-bcfb-58327c3ff040   20Gi       RWO            standard       5m29s
data-artifactory-jcr-postgresql-0                  Bound    pvc-a428d80a-d3dd-4d08-aeb7-1c98bf86cb45   5Gi        RWO            standard       5m29s

docker@minikube-m03:~$ ls -ld /tmp/hostpath-provisioner/artifactory-jcr/data-artifactory-jcr-postgresql-0
drwxr-xr-x. 3 root root 4096 Feb 19 10:46 /tmp/hostpath-provisioner/artifactory-jcr/data-artifactory-jcr-postgresql-0

docker@minikube-m02:~$ ls -ld /tmp/hostpath-provisioner/artifactory-jcr/artifactory-volume-artifactory-jcr-artifactory-0/
drwxr-xr-x. 2 root root 4096 Feb 19 10:46 /tmp/hostpath-provisioner/artifactory-jcr/artifactory-volume-artifactory-jcr-artifactory-0/

I tested with single node minikube cluster and permissions (777) were set correctly and it worked fine.

docker@minikube:~$ ls -l /tmp/hostpath-provisioner/artifactory-jcr/                                                 
total 8
drwxrwxrwx. 2 root root 4096 Feb 19 10:46 artifactory-volume-artifactory-jcr-artifactory-0
drwxrwxrwx. 2 root root 4096 Feb 19 10:46 data-artifactory-jcr-postgresql-0

@prezha
Copy link
Contributor

prezha commented Feb 19, 2021

@slavonicsniper
interesting...
it did work when i've tried it, but i cannot assert now to which node(s) the pods got assigned to - i'll have a look

@harryttd
Copy link
Author

harryttd commented Feb 19, 2021

❯ minikube version
minikube version: v1.17.1
commit: 043bdca07e54ab6e4fc0457e3064048f34133d7e

This is definitely still an issue. I just ran my original example again and get the same errors on node minikube-m02. mkdir: can't create directory '/var/foo/bar': Permission denied

In terms of asserting which pod is on which node:

 k get pods -n myapp -o wide | awk '{print $1 "  " $7}'

Before running this i would actually bump up the number of replicas to maybe 4 so that it is more likely that you have pods ending up on each node. Edited my original comment

@prezha
Copy link
Contributor

prezha commented Feb 23, 2021

@slavonicsniper, @harryttd thanks for your reply

i looked a bit more into it and in the summary: you are right, and i'd say that it doesn't work in multinode env - "by design"

minikube default storage provisioner uses hostPath type of volume, which means that it will use a local folder (/tmp/hostpath-provisioner/<namespace>/<pvc_name>) on its node to back the pv (created by pvc)

since this storage-provisioner plugin is installed by default (and is used by default, unless you specify another in your pvc under spec.storageClassName), it will be tied to the first/master node (its local folder) even in a multinode setup, and one of the things it does is modifying the perms of the mounted folder (to 0777), so that containers run by non-root users may write to it
that is the reason you see different behaviours on non-master nodes

now, you could run a simple init container in your app pods that with fix the mount folder perms for you, but that still would probably not be a solution for sharing data b/w pods on multiple nodes (different nodes => different mount folders => will see different data; ie, no data replication):

only the pods that are currently on the node (master - by default) serving the hostPath volume will be able to access the data stored on that node's folder - ie, pods on other nodes will not

as such, this type of volume is not meant to be used for shared loads (across pods on different nodes in a cluster), and other solutions should be used there - ref: https://kubernetes.io/docs/concepts/storage/volumes/#hostpath

i hope this helps

p.s. a couple of side notes regarding the pvc yaml given in the issue description:

  • hostPath can only use ReadWriteOnce under spec.accessModes
  • selector.matchLabels.storage-type may conflict with storageClass

@pythonwood
Copy link

still a problem in minikube version: v1.19.0

minikube start -n 3

running. https://kubernetes.io/zh/docs/tasks/run-application/run-single-instance-stateful-application/

Warning Failed 10s (x3 over 12s) kubelet Error: stat /tmp/hostpath-provisioner/default/data-mysql-0: no such file or directory

@js-kyle
Copy link

js-kyle commented Jul 2, 2021

This is still an issue on minikube 1.20.0

@fredgate
Copy link

fredgate commented Sep 20, 2021

Actually multi nodes on minikube does not work because each node has a different hostpath-provisioner directory.
A pod could start running on first node for example, create and update files, then be scheluded on another node where there is no more files.

Each node mounts its data volume (for example /var/lib/docker/volumes/{node_name}/_data/hostpath-provisioner with Docker driver), each with their hostpath-provisioner sub directory.
There should not be one hostpath-provisioner directory per node data directory, but there should be a single hostpath-provisioner directory that would be mounted in all nodes. So each node should mount two volumes : its dedicated data directory (without hostpath-provisioner sub directory), and the shared hostpath-provisioner directory.

I think that this issue can be reopened.

@abubakarm94
Copy link

This is still an issue on minikube v1.23.2

@abries
Copy link

abries commented Jan 24, 2023

This issue still exists in minikube 1.28.0 (running on podman rootfully in an Ubuntu 20.04 WSL).
One thing I noticed is that on the master node, the directory /tmp/hostpath-provisioner/$NAMESPACE/$PVC_NAME has mode 0777, whereas on the other node(s?) the mode is 0755, so the docker user can not write to the directory there, which is also what applications may complain about in their startup log right before crashing. If you change the directory mode on the worker node to 0777 manually (e.g. via minikube ssh $CLUSTERNAME-m02, after switching to root with sudo su), the issue disappears.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
co/multinode Issues related to multinode clusters kind/bug Categorizes issue or PR as related to a bug. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
Projects
None yet
Development

No branches or pull requests