Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set the --force-systemd true or false automatically (by detecting the cgroups) #8348

Open
priyawadhwa opened this issue Jun 1, 2020 · 25 comments
Labels
help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/feature Categorizes issue or PR as related to a new feature. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. needs-solution-message Issues where where offering a solution for an error would be helpful priority/backlog Higher priority than priority/awaiting-more-evidence.

Comments

@priyawadhwa
Copy link

priyawadhwa commented Jun 1, 2020

Look into if we should be setting --force-systemd=true by default, and if this results in any performance improvement

documentation says we need to use same as your system
if your system uses systemd, you should use systemd

@priyawadhwa priyawadhwa added kind/feature Categorizes issue or PR as related to a new feature. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. labels Jun 1, 2020
@priyawadhwa priyawadhwa added this to the v.1.12.0-candidate milestone Jun 1, 2020
@priyawadhwa priyawadhwa self-assigned this Jun 1, 2020
@priyawadhwa priyawadhwa added priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. and removed priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. labels Jun 1, 2020
@medyagh medyagh modified the milestones: v.1.12.0-previous candidate (dumpster fire), v1.12.0-candidate Jun 1, 2020
@paddy-hack
Copy link

How will this affect those Linux distributions that do not support/require systemd?

I for one, changed distros just to get rid of systemd, currently using Devuan with the docker-ce packages for the corresponding Debian distribution (buster).

This seems to work fine as long as I don't try to use Docker to run anything else after a minikube start. If I do, I see something like the following

$ docker run --rm -it alpine:3.12 /bin/sh
docker: Error response from daemon: cgroups: cannot find cgroup mount destination: unknown.

Before running minikube start the above docker invocation works just fine.

Actually, I cannot even stop and start minikube again 😒
Current work-around is a reboot.

For the record, I have cgroupfs-mount installed.

@priyawadhwa
Copy link
Author

Hey @paddy-hack that's an interesting setup and would be important to explore before we set --force-systemd=true by default.

Just to clarify, this sets docker within the minikube VM to use systemd as cgroup manager (we already have systemd running in minikube).

Does running:

minikube start --force-systemd

work on your machine? And could you provide the output of docker info?

@medyagh
Copy link
Member

medyagh commented Jun 4, 2020

@paddy-hack I agree with @priyawadhwa this would be for the systemd inside minikube but that is still a good point we need to ensure minikube is capable of running that cgroup inside that setup as well.

is there a way you can try and see if that doesnt work for you we can handle it on minikube?

@paddy-hack
Copy link

Replying to #6954, I had already gone through a

minikube start
minikube status
minikube stop
minikube start

but after that I got

paddy-hack@boson:~$ minikube start --force-systemd
😄  minikube v1.11.0 on Debian 10.0
✨  Using the docker driver based on existing profile
👍  Starting control plane node minikube in cluster minikube
🔄  Restarting existing docker container for "minikube" ...
🤦  StartHost failed, but will try again: driver start: start: docker start minikube: exit status 1
stdout:

stderr:
Error response from daemon: OCI runtime create failed: container with id exists: 53ac2f88bff8b8ea2db5cd4e9a3133ea9637cc8bd2e59c550008fba242ed74a7: unknown
Error: failed to start containers: minikube

🔄  Restarting existing docker container for "minikube" ...
😿  Failed to start docker container. "minikube start" may fix it: driver start: start: docker start minikube: exit status 1
stdout:

stderr:
Error response from daemon: OCI runtime create failed: container with id exists: 53ac2f88bff8b8ea2db5cd4e9a3133ea9637cc8bd2e59c550008fba242ed74a7: unknown
Error: failed to start containers: minikube


💣  error provisioning host: Failed to start host: driver start: start: docker start minikube: exit status 1
stdout:

stderr:
Error response from daemon: OCI runtime create failed: container with id exists: 53ac2f88bff8b8ea2db5cd4e9a3133ea9637cc8bd2e59c550008fba242ed74a7: unknown
Error: failed to start containers: minikube


😿  minikube is exiting due to an error. If the above message is not useful, open an issue:
👉  https://github.com/kubernetes/minikube/issues/new/choose

Restarting the docker service does not change this. I'll see what I get after a reboot to get things back to working order 🤢

Here's the docker info output.

paddy-hack@boson:~$ docker info
Client:
 Debug Mode: false

Server:
 Containers: 1
  Running: 0
  Paused: 0
  Stopped: 1
 Images: 39
 Server Version: 19.03.11
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Native Overlay Diff: true
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 7ad184331fa3e55e52b890ea95e65ba581ae3429
 runc version: dc9208a3303feef5b3839f4323d9beb36df0a9dd
 init version: fec3683
 Security Options:
  apparmor
  seccomp
   Profile: default
 Kernel Version: 4.19.0-9-amd64
 Operating System: Devuan GNU/Linux 3 (beowulf)
 OSType: linux
 Architecture: x86_64
 CPUs: 2
 Total Memory: 3.608GiB
 Name: boson
 ID: FFPZ:6IG2:WOZN:WC5L:ZZWQ:4VUO:BNKJ:UX6G:SYNW:ASKJ:GBCJ:VF5K
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

WARNING: No swap limit support

@paddy-hack
Copy link

Rebooted and tried again

paddy-hack@boson:~$ minikube start --force-systemd
😄  minikube v1.11.0 on Debian 10.0
✨  Using the docker driver based on existing profile
👍  Starting control plane node minikube in cluster minikube
🔄  Restarting existing docker container for "minikube" ...
🐳  Preparing Kubernetes v1.18.3 on Docker 19.03.2 ...
    ▪ kubeadm.pod-network-cidr=10.244.0.0/16
🔎  Verifying Kubernetes components...
🌟  Enabled addons: default-storageclass, storage-provisioner
🏄  Done! kubectl is now configured to use "minikube"
💡  For best results, install kubectl: https://kubernetes.io/docs/tasks/tools/install-kubectl/
paddy-hack@boson:~$ minikube status
minikube
type: Control Plane
host: Running
kubelet: Running
apiserver: Running
kubeconfig: Configured

paddy-hack@boson:~$ minikube stop
✋  Stopping "minikube" in docker ...
🛑  Powering off "minikube" via SSH ...
🛑  Node "minikube" stopped.
paddy-hack@boson:~$ minikube start --force-systemd
😄  minikube v1.11.0 on Debian 10.0
✨  Using the docker driver based on existing profile
👍  Starting control plane node minikube in cluster minikube
🔄  Restarting existing docker container for "minikube" ...
🤦  StartHost failed, but will try again: driver start: start: docker start minikube: exit status 1
stdout:

stderr:
Error response from daemon: cgroups: cannot find cgroup mount destination: unknown
Error: failed to start containers: minikube

🔄  Restarting existing docker container for "minikube" ...
😿  Failed to start docker container. "minikube start" may fix it: driver start: start: docker start minikube: exit status 1
stdout:

stderr:
Error response from daemon: OCI runtime create failed: container with id exists: 53ac2f88bff8b8ea2db5cd4e9a3133ea9637cc8bd2e59c550008fba242ed74a7: unknown
Error: failed to start containers: minikube


💣  error provisioning host: Failed to start host: driver start: start: docker start minikube: exit status 1
stdout:

stderr:
Error response from daemon: OCI runtime create failed: container with id exists: 53ac2f88bff8b8ea2db5cd4e9a3133ea9637cc8bd2e59c550008fba242ed74a7: unknown
Error: failed to start containers: minikube


😿  minikube is exiting due to an error. If the above message is not useful, open an issue:
👉  https://github.com/kubernetes/minikube/issues/new/choose

@paddy-hack
Copy link

paddy-hack commented Jun 5, 2020

But getting back to this reliance on systemd, I ditched Debian (after two decades) and moved to Devuan to get rid of systemd. Seeing that minikube uses systemd, even considers forcing it upon me, makes me rethink whether I should be using minikube in the first place 🤔

@priyawadhwa
Copy link
Author

Hey @paddy-hack -- just to clarify, minikube does use systemd but only within the running VM or container (you don't need systemd on your machine).

The --force-systemd flag is used to make docker within the VM use systemd as the cgroup manager, as opposed to cgroupfs (systemd is running with or without that flag, that's how k8s comes up). Enabling the flag actually makes the kubernetes cluster more stable, as described in the k8s documentation here which is why we are considering setting it as the default.

In terms of the error you're getting from docker, it's a known docker issue on Linux:

docker/for-linux#219

which a temporary solution mentioned in this comment:

docker/for-linux#219 (comment)

@afbjorklund
Copy link
Collaborator

The --force-systemd flag is used to make docker within the VM use systemd as the cgroup manager, as opposed to cgroupfs (systemd is running with or without that flag, that's how k8s comes up). Enabling the flag actually makes the kubernetes cluster more stable, as described in the k8s documentation here which is why we are considering setting it as the default.

The key thing here is to use the same cgroup driver. The minikube VM is using systemd, so then it makes sense to have Docker use systemd. If the host OS is using cgroupfs (not systemd), then it makes sense to have Docker use cgroupfs. The minikube settings are supposed to pick up whichever is in use, and forward this preference to the kubelet (since 0e83dd4). So either should be fine...

@afbjorklund
Copy link
Collaborator

Also, currently systemd-in-systemd is broken in podman so it has no choice but to run cgroupfs...

pkg/drivers/kic/oci/oci.go-     // to run nested container from privileged container in podman https://bugzilla.redhat.com/show_bug.cgi?id=1687713
pkg/drivers/kic/oci/oci.go-     // only add when running locally (linux), when running remotely it needs to be configured on server in libpod.conf
pkg/drivers/kic/oci/oci.go-     if ociBin == Podman && runtime.GOOS == "linux" {
pkg/drivers/kic/oci/oci.go:             args = append(args, "--cgroup-manager", "cgroupfs")
pkg/drivers/kic/oci/oci.go-     }

@afbjorklund
Copy link
Collaborator

afbjorklund commented Jun 6, 2020

I tested with Devuan Beowulf.

Can confirm that trying to start minikube with the docker driver messes up docker (like above).

Probably something with the entrypoint /sys, that destroys some cgroup settings for cgroupfs ?

devuan@devuan:~$ docker logs minikube
INFO: ensuring we can execute /bin/mount even with userns-remap
INFO: remounting /sys read-only
INFO: making mounts shared
INFO: fix cgroup mounts for all subsystems
INFO: clearing and regenerating /etc/machine-id
Initializing machine ID from random generator.
INFO: faking /sys/class/dmi/id/product_name to be "kind"
INFO: faking /sys/class/dmi/id/product_uuid to be random
INFO: faking /sys/devices/virtual/dmi/id/product_uuid as well
INFO: setting iptables to detected mode: legacy
Inserted module 'autofs4'
systemd 242 running in system mode. (+PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD +IDN2 -IDN +PCRE2 default-hierarchy=hybrid)
Detected virtualization docker.
Detected architecture x86-64.
Failed to create symlink /sys/fs/cgroup/cpu: File exists
Failed to create symlink /sys/fs/cgroup/cpuacct: File exists
Failed to create symlink /sys/fs/cgroup/net_cls: File exists
Failed to create symlink /sys/fs/cgroup/net_prio: File exists

Welcome to Ubuntu 19.10!

Set hostname to <minikube>.

Run: docker run -d -t --privileged --security-opt seccomp=unconfined --security-opt apparmor=unconfined --tmpfs /tmp --tmpfs /run -v /lib/modules:/lib/modules:ro --hostname minikube --name minikube --label created_by.minikube.sigs.k8s.io=true --label name.minikube.sigs.k8s.io=minikube --label role.minikube.sigs.k8s.io= --label mode.minikube.sigs.k8s.io=minikube --volume minikube:/var --cpus=2 --memory=2200mb -e container=docker --expose 8443 --publish=127.0.0.1::8443 --publish=127.0.0.1::22 --publish=127.0.0.1::2376 --publish=127.0.0.1::5000 gcr.io/k8s-minikube/kicbase:v0.0.10@sha256:f58e0c4662bac8a9b5dda7984b185bad8502ade5d9fa364bf2755d636ab51438


Beyond the extra "docker" layer, we also have some cgroups v2 compat created:

/sys/fs/cgroup/unified/init.scope

Anyway, since kicbase uses systemd (through KIND) it seems it fails on cgroupfs.
Previously only cgroupfs-on-systemd was tested, not this systemd-on-cgroupfs...

As the article above implies, mixing and matching different init is asking for trouble.
And I don't think we will provide a minikube.iso or a kicbase image without systemd.

So these systems (Devuan) will need to use --vm.

Or with a dedicated VM for it, maybe --driver none.

@afbjorklund
Copy link
Collaborator

afbjorklund commented Jun 6, 2020

If anyone wants to look into this further, the message is from containerd on /proc/self/mountinfo:

https://github.com/containerd/cgroups/blob/master/utils.go#L340

It doesn't seem so happy about the new "name=systemd" cgroup from /proc/self/cgroup ?

Similar to moby/moby#38822


This also means that this is the workaround, to get Docker back (without reboot):

sudo mkdir /sys/fs/cgroup/systemd
sudo mount -t cgroup -o none,name=systemd,xattr cgroup /sys/fs/cgroup/systemd

If this is acceptable, then this is the way to run minikube with Docker-in-Docker

@paddy-hack
Copy link

Guess I'll be using qemu and minikube --vm then for the time being.

@afbjorklund
Copy link
Collaborator

afbjorklund commented Jun 7, 2020

@priyawadhwa :

just to clarify, minikube does use systemd but only within the running VM or container (you don't need systemd on your machine).

We should add a solution message, when trying to use docker driver without systemd cgroup.

The user doesn't actually need to run systemd as their PID 1 nor any daemons or units, though.

@afbjorklund afbjorklund added the needs-solution-message Issues where where offering a solution for an error would be helpful label Jun 7, 2020
@afbjorklund
Copy link
Collaborator

Guess I'll be using qemu and minikube --vm then for the time being.

That should still work, note that you need libvirt (and not just QEMU/KVM)

@paddy-hack
Copy link

Thanks for the heads up on libvirt 🙇

@afbjorklund
Copy link
Collaborator

afbjorklund commented Jun 10, 2020

Thanks for the heads up on libvirt

At one point we considered renaming the driver from docker-machine-kvm to docker-machine-libvirt-driver, but at that point it was probably "too late" and the historical name won. Now forked as kvm2.

The qemu (with kvm) driver has some issues with creating the networks for kubernetes, so it works better in a simpler docker context. So that's why we are using the (system) libvirt wrapper instead...

https://libvirt.org/drvqemu.html

The "qemu:///system" family of URIs connect to a libvirtd instance running as the privileged system account 'root'. Thus the QEMU instances spawned from this driver may have much higher privileges than the client application managing them. The intended use case for this driver is server virtualization, where the virtual machines may need to be connected to host resources (block, PCI, USB, network devices) whose access requires elevated privileges.

It should warn about it. (#5617)

@medyagh medyagh modified the milestones: v1.14.0, v1.15.0-candidate Oct 12, 2020
@medyagh
Copy link
Member

medyagh commented Oct 12, 2020

we need to figure out the mac os user's best default by finding out Docker's implemenation of their VM.
if their VM is using cgroup or systemd.

and for github actions minikube should autodetect it is using github actions ( there is an environment varialble)

@medyagh
Copy link
Member

medyagh commented Oct 12, 2020

@afbjorklund susggests enabling kuberentes on docker on dekstop and see what why are using.

@priyawadhwa priyawadhwa removed this from the v1.15.0 milestone Oct 19, 2020
@priyawadhwa priyawadhwa removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 19, 2020
@medyagh
Copy link
Member

medyagh commented Oct 28, 2020

maybe we can exec into the Docker machine created by docker desktop and see what cgroup it uses

https://gist.github.com/BretFisher/5e1a0c7bcca4c735e716abf62afad389

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 26, 2021
@spowelljr spowelljr added the lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. label Feb 18, 2021
@priyawadhwa priyawadhwa removed their assignment Feb 26, 2021
@medyagh medyagh added this to the v1.20.0-candidate milestone Mar 3, 2021
@medyagh medyagh changed the title explore setting force-systemd=true by default Set the --force-systemd true or false automatically (by detecting the cgroups) Mar 3, 2021
@prezha
Copy link
Contributor

prezha commented Mar 3, 2021

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 3, 2021
@sharifelgamal sharifelgamal added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. and removed priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. labels May 3, 2021
@medyagh medyagh modified the milestones: v1.21.0, 1.22.0-candidate May 3, 2021
@sharifelgamal sharifelgamal removed this from the 1.22.0-candidate milestone Jun 14, 2021
@sharifelgamal sharifelgamal added priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. and removed priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. labels Jun 14, 2021
@sharifelgamal
Copy link
Collaborator

@govargo would you be interested in looking at this?

@govargo
Copy link
Contributor

govargo commented Jun 15, 2021

It may take some times because I'm not good at this point.
But I'll try to look this from tomorrow.

@spowelljr spowelljr added priority/backlog Higher priority than priority/awaiting-more-evidence. and removed priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. labels Oct 6, 2021
@jiangxiaobin96
Copy link

jiangxiaobin96 commented Jun 8, 2023

HI, I want to ask when I use systemd.SdNotify to confirm weather running on a systemd system, does command minikube start --force --driver=docker --force-systemd work and return without err?
I try and find it does not work, so what to set can pass systemd.SdNotify?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/feature Categorizes issue or PR as related to a new feature. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. needs-solution-message Issues where where offering a solution for an error would be helpful priority/backlog Higher priority than priority/awaiting-more-evidence.
Projects
None yet
Development

No branches or pull requests