Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cgroups v2 support - possible with systemd/freeipa-container ? #429

Closed
danryu opened this issue Oct 26, 2021 · 42 comments
Closed

cgroups v2 support - possible with systemd/freeipa-container ? #429

danryu opened this issue Oct 26, 2021 · 42 comments

Comments

@danryu
Copy link

danryu commented Oct 26, 2021

Lack of cgroups v2 support is now a hard blocker in a number of production use-cases, where eg Debian11 as a Kubernetes node OS is mandated and v2 is required by the network manager.

Is FreeIPA container ever going to support cgroups v2, and if so, when?

EDIT: FreeIPA container DOES support cgroups v2. The issue (as revealed in the thread) is Docker runtime failing to configure systemd correctly between host/container on cgroups v2 host systems.

podman and other container runtimes are expected to work correctly with cgroups v2 and FreeIPA.

@abbra
Copy link
Contributor

abbra commented Oct 26, 2021

FreeIPA container works just fine with cgroups v2 if your system actually provides cgroups v2. I have no problems with Fedora 34+ and podman, with both root and rootless runs.

@danryu
Copy link
Author

danryu commented Oct 26, 2021

Thanks for the swift response. Going to do some further testing and feed back.

@danryu
Copy link
Author

danryu commented Oct 26, 2021

Running on a cgroups v2 enabled environment (ubuntu 21.10):

$  ls -al /sys/fs/cgroup/cgroup.controllers
-r--r--r-- 1 root root 0 Oct 26 13:50 /sys/fs/cgroup/cgroup.controllers

and running FreeIPA as per docs:

$ sudo mkdir /var/lib/ipa-data
$ docker run --name freeipa-server-container  -ti  -h ipa.example.test --read-only   -v /sys/fs/cgroup:/sys/fs/cgroup:ro   -v /var/lib/ipa-data:/data:Z  freeipa/freeipa-server:fedora-rawhide-4.9.7 
systemd v248.9-1.fc34 running in system mode. (+PAM +AUDIT +SELINUX -APPARMOR +IMA +SMACK +SECCOMP +GCRYPT +GNUTLS +OPENSSL +ACL +BLKID +CURL +ELFUTILS +FIDO2 +IDN2 -IDN +IPTC +KMOD +LIBCRYPTSETUP +LIBFDISK +PCRE2 +PWQUALITY +P11KIT +QRENCODE +BZIP2 +LZ4 +XZ +ZLIB +ZSTD +XKBCOMMON +UTMP +SYSVINIT default-hierarchy=unified)
Detected virtualization docker.
Detected architecture x86-64.
Initializing machine ID from random generator.
Failed to create /init.scope control group: Read-only file system
Failed to allocate manager object: Read-only file system
[!!!!!!] Failed to allocate manager object.
Exiting PID 1...

Adding or removing the -v /sys/fs/cgroup:/sys/fs/cgroup:ro option makes no difference, nor does changing the freeipa-container version back to freeipa/freeipa-server:fedora-32 (I tried a few versions).

I thought this might be related to systemd 248 issues as referenced here: https://www.mail-archive.com/systemd-devel@lists.freedesktop.org/msg46076.html
but freeipa/freeipa-server:fedora-32 uses systemd v245 and fails in a similar way:

$ docker run --name freeipa-server-container  -ti  -h ipa.example.test --read-only   -v /sys/fs/cgroup:/sys/fs/cgroup:ro   -v /var/lib/ipa-data:/data:Z  freeipa/freeipa-server:fedora-32
systemd v245.9-1.fc32 running in system mode. (+PAM +AUDIT +SELINUX +IMA -APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD +IDN2 -IDN +PCRE2 default-hierarchy=unified)
Detected virtualization container-other.
Detected architecture x86-64.
Set hostname to <ipa.example.test>.
Initializing machine ID from random generator.
Failed to create /init.scope control group: Read-only file system
Failed to allocate manager object: Read-only file system
[!!!!!!] Failed to allocate manager object.
Exiting PID 1...

@abbra
Copy link
Contributor

abbra commented Oct 26, 2021

Try partial tests as described in the debugging section of the README: https://github.com/freeipa/freeipa-container#debugging. It looks like you don't even get to run anything from FreeIPA itself, you have basic problem of running a systemd-based container.

@danryu
Copy link
Author

danryu commented Oct 26, 2021

I tried the test as advised, however it didn't provide any insight to the systemd error.

From https://github.com/freeipa/freeipa-container#debugging :

$ ./tests/run-partial-tests.sh Dockerfile 
Sending build context to Docker daemon  385.5kB
Step 1/13 : FROM registry.fedoraproject.org/fedora:34
 ---> 191682d67252
.......//..............
Step 2/2 : VOLUME [ "/var/log/journal" ]
 ---> Using cache
 ---> f5b8c8ae7ec0
Successfully built f5b8c8ae7ec0
Successfully tagged localhost/freeipa-server-test-addons:Dockerfile
+ docker run --name freeipa-server-container-Dockerfile -d -h ipa.example.test --tmpfs /run --tmpfs /tmp -v /sys/fs/cgroup:/sys/fs/cgroup:ro --sysctl net.ipv6.conf.all.disable_ipv6=0 localhost/freeipa-server-test-addons:Dockerfile
29059cb79c9655a4fdeff90e33c8352a98a38be97d10c32ff80f5cca2f88cb41
Executing ./tests/systemd-container-failed.sh freeipa-server-container-Dockerfile
Error response from daemon: Container 29059cb79c9655a4fdeff90e33c8352a98a38be97d10c32ff80f5cca2f88cb41 is not running

@adelton
Copy link
Collaborator

adelton commented Oct 26, 2021

Could you try those docker run executions with --security-opt seccomp=unconfined?

@danryu
Copy link
Author

danryu commented Oct 26, 2021

@adelton
Same story - see below.

$ docker run --name freeipa-server-container  -ti  -h ipa.example.test --security-opt seccomp=unconfined  --read-only   -v /sys/fs/cgroup:/sys/fs/cgroup:ro     -v /var/lib/ipa-data:/data:Z  freeipa/freeipa-server:fedora-rawhide-4.9.7 
systemd v249.5-1.fc36 running in system mode (+PAM +AUDIT +SELINUX -APPARMOR +IMA +SMACK +SECCOMP +GCRYPT +GNUTLS +OPENSSL +ACL +BLKID +CURL +ELFUTILS +FIDO2 +IDN2 -IDN +IPTC +KMOD +LIBCRYPTSETUP +LIBFDISK +PCRE2 +PWQUALITY +P11KIT +QRENCODE +BZIP2 +LZ4 +XZ +ZLIB +ZSTD +XKBCOMMON +UTMP +SYSVINIT default-hierarchy=unified)
Detected virtualization podman.
Detected architecture x86-64.
Failed to create /init.scope control group: Read-only file system
Failed to allocate manager object: Read-only file system
[!!!!!!] Failed to allocate manager object.
Exiting PID 1...

@fcami
Copy link

fcami commented Oct 26, 2021

What happens if you remove "-v /sys/fs/cgroup:/sys/fs/cgroup:ro"?
I hope removing "--read-only" does not fix your problem.

@danryu
Copy link
Author

danryu commented Oct 26, 2021

@fcami I tried removing -v /sys/fs/cgroup:/sys/fs/cgroup:ro and --read-only, alternately and together.
No change :(

Should I take it then that debian11-based hosts can not currently use cgroups v2 with freeipa-container?
Unless there are any positive reports out there ...

@adelton
Copy link
Collaborator

adelton commented Oct 26, 2021

It seems like latest dockers can start on cgroups v2 hosts but cannot run systemd in the container.

So the thing to resolve is -- how do you run systemd in containers on those machines?

@adelton
Copy link
Collaborator

adelton commented Oct 26, 2021

This is very similar to failures under K3s that @kevin-leong reported in #426, also on Ubuntu 21.10. You might want to join forces to figure out how system is supposed to be run in containers on those hosts.

@danryu
Copy link
Author

danryu commented Oct 26, 2021

Ok, I'll take that as meaning that freeipa-container does not support Debian-based cgroup v2 systems.

If the service runs out of the box as per documentation with Fedora/CentOS/RHEL cgroup v2 systems, but doesn't run at all with Debian-based cgroup v2 systems, that is definitely worth flagging up in the documentation.

While I appreciate the reference to the other open issue, which I had previously noted, it's outside current project scope to find fixes for secondary upstream components (systemd), unfortunately...

@danryu
Copy link
Author

danryu commented Oct 26, 2021

Following some of what's at https://serverfault.com/questions/1053187/systemd-fails-to-run-in-a-docker-container-when-using-cgroupv2-cgroupns-priva , I was able to get further in the execution using these options: --cgroupns=host -v /sys/fs/cgroup:/sys/fs/cgroup:rw ...

However, I'm not sure yet of the implications of running with these options!

$ docker run --name freeipa-server-container  -ti  -h ipa.example.test --cgroupns=host   -v /sys/fs/cgroup:/sys/fs/cgroup:rw     -v /var/lib/ipa-data:/data:Z  freeipa/freeipa-server:fedora-rawhide-4.9.7 
systemd v249.5-1.fc36 running in system mode (+PAM +AUDIT +SELINUX -APPARMOR +IMA +SMACK +SECCOMP +GCRYPT +GNUTLS +OPENSSL +ACL +BLKID +CURL +ELFUTILS +FIDO2 +IDN2 -IDN +IPTC +KMOD +LIBCRYPTSETUP +LIBFDISK +PCRE2 +PWQUALITY +P11KIT +QRENCODE +BZIP2 +LZ4 +XZ +ZLIB +ZSTD +XKBCOMMON +UTMP +SYSVINIT default-hierarchy=unified)
Detected virtualization podman.
Detected architecture x86-64.
Initializing machine ID from random generator.
Queued start job for default target Minimal target for containerized FreeIPA server.
Tue Oct 26 15:51:30 UTC 2021 /usr/sbin/ipa-server-configure-first 
Unable to determine the amount of available RAM
The ipa-server-install command failed. See /var/log/ipaserver-install.log for more information
Sending SIGTERM to remaining processes...
Sending SIGKILL to remaining processes...
All filesystems, swaps, loop devices, MD devices and DM devices detached.
Exiting container.

@adelton
Copy link
Collaborator

adelton commented Oct 26, 2021

@danryu Can you try just -v /sys/fs/cgroup:/sys/fs/cgroup:rw, without that --cgroupns=host?

And if it still fails, add --security-opt apparmor=unconfined?

@adelton
Copy link
Collaborator

adelton commented Oct 26, 2021

As for that Unable to determine the amount of available RAM message, I assume there might be some more details in the ipaserver-install.log but I wonder if a --skip-mem-check option to ipa-server-install would help?

@abbra
Copy link
Contributor

abbra commented Oct 26, 2021

@danryu 'unable to determine the amount of available RAM' is another sign of improperly configured cgroups. Can you provide the ipaserver-install.log from your data partition?

@abbra
Copy link
Contributor

abbra commented Oct 26, 2021

@adelton we have two places that produce this error message: one inside container detection part which handles both cgroups v1 and v2, one outside it, when everything else didn't allow us to detect the amount of memory available. If it is the former, then this definitely means host misconfiguration. If it is the latter, it means systemd tools were unable to detect that we run in container and this also means host misconfiguration.

@danryu
Copy link
Author

danryu commented Oct 26, 2021

@abbra
ipaserver-install.log as requested.

$ sudo cat /var/lib/ipa-data/var/log/ipaserver-install.log
2021-10-26T15:51:31Z DEBUG Logging to /var/log/ipaserver-install.log
2021-10-26T15:51:31Z DEBUG ipa-server-install was invoked with arguments [] and options: {'unattended': False, 'ip_addresses': None, 'domain_name': None, 'realm_name': None, 'host_name': None, 'ca_cert_files': None, 'domain_level': None, 'setup_adtrust': False, 'setup_kra': False, 'setup_dns': False, 'idstart': None, 'idmax': None, 'no_hbac_allow': False, 'no_pkinit': False, 'no_ui_redirect': False, 'dirsrv_config_file': None, 'skip_mem_check': False, 'dirsrv_cert_files': None, 'http_cert_files': None, 'pkinit_cert_files': None, 'dirsrv_cert_name': None, 'http_cert_name': None, 'pkinit_cert_name': None, 'mkhomedir': False, 'ntp_servers': None, 'ntp_pool': None, 'no_ntp': False, 'force_ntpd': False, 'ssh_trust_dns': False, 'no_ssh': False, 'no_sshd': False, 'no_dns_sshfp': False, 'external_ca': False, 'external_ca_type': None, 'external_ca_profile': None, 'external_cert_files': None, 'subject_base': None, 'ca_subject': None, 'ca_signing_algorithm': None, 'pki_config_override': None, 'allow_zone_overlap': False, 'reverse_zones': None, 'no_reverse': False, 'auto_reverse': False, 'zonemgr': None, 'forwarders': None, 'no_forwarders': False, 'auto_forwarders': False, 'forward_policy': None, 'no_dnssec_validation': False, 'no_host_dns': False, 'enable_compat': False, 'netbios_name': None, 'no_msdcs': False, 'rid_base': None, 'secondary_rid_base': None, 'ignore_topology_disconnect': False, 'ignore_last_of_role': False, 'verbose': False, 'quiet': False, 'log_file': None, 'uninstall': False}
2021-10-26T15:51:31Z DEBUG IPA version 4.9.7-2.fc36
2021-10-26T15:51:31Z DEBUG IPA platform fedora_container
2021-10-26T15:51:31Z DEBUG IPA os-release Fedora Linux 36 (Container Image Prerelease)
2021-10-26T15:51:31Z DEBUG container detected
2021-10-26T15:51:32Z DEBUG   File "/usr/lib/python3.10/site-packages/ipapython/admintool.py", line 180, in execute
    return_value = self.run()
  File "/usr/lib/python3.10/site-packages/ipapython/install/cli.py", line 342, in run
    return cfgr.run()
  File "/usr/lib/python3.10/site-packages/ipapython/install/core.py", line 358, in run
    self.validate()
  File "/usr/lib/python3.10/site-packages/ipapython/install/core.py", line 368, in validate
    for _nothing in self._validator():
  File "/usr/lib/python3.10/site-packages/ipapython/install/core.py", line 431, in __runner
    exc_handler(exc_info)
  File "/usr/lib/python3.10/site-packages/ipapython/install/core.py", line 455, in _handle_validate_exception
    self._handle_exception(exc_info)
  File "/usr/lib/python3.10/site-packages/ipapython/install/core.py", line 450, in _handle_exception
    six.reraise(*exc_info)
  File "/usr/lib/python3.10/site-packages/six.py", line 719, in reraise
    raise value
  File "/usr/lib/python3.10/site-packages/ipapython/install/core.py", line 421, in __runner
    step()
  File "/usr/lib/python3.10/site-packages/ipapython/install/core.py", line 418, in <lambda>
    step = lambda: next(self.__gen)
  File "/usr/lib/python3.10/site-packages/ipapython/install/util.py", line 81, in run_generator_with_yield_from
    six.reraise(*exc_info)
  File "/usr/lib/python3.10/site-packages/six.py", line 719, in reraise
    raise value
  File "/usr/lib/python3.10/site-packages/ipapython/install/util.py", line 59, in run_generator_with_yield_from
    value = gen.send(prev_value)
  File "/usr/lib/python3.10/site-packages/ipapython/install/core.py", line 633, in _configure
    next(validator)
  File "/usr/lib/python3.10/site-packages/ipapython/install/core.py", line 431, in __runner
    exc_handler(exc_info)
  File "/usr/lib/python3.10/site-packages/ipapython/install/core.py", line 455, in _handle_validate_exception
    self._handle_exception(exc_info)
  File "/usr/lib/python3.10/site-packages/ipapython/install/core.py", line 518, in _handle_exception
    self.__parent._handle_exception(exc_info)
  File "/usr/lib/python3.10/site-packages/ipapython/install/core.py", line 450, in _handle_exception
    six.reraise(*exc_info)
  File "/usr/lib/python3.10/site-packages/six.py", line 719, in reraise
    raise value
  File "/usr/lib/python3.10/site-packages/ipapython/install/core.py", line 515, in _handle_exception
    super(ComponentBase, self)._handle_exception(exc_info)
  File "/usr/lib/python3.10/site-packages/ipapython/install/core.py", line 450, in _handle_exception
    six.reraise(*exc_info)
  File "/usr/lib/python3.10/site-packages/six.py", line 719, in reraise
    raise value
  File "/usr/lib/python3.10/site-packages/ipapython/install/core.py", line 421, in __runner
    step()
  File "/usr/lib/python3.10/site-packages/ipapython/install/core.py", line 418, in <lambda>
    step = lambda: next(self.__gen)
  File "/usr/lib/python3.10/site-packages/ipapython/install/util.py", line 81, in run_generator_with_yield_from
    six.reraise(*exc_info)
  File "/usr/lib/python3.10/site-packages/six.py", line 719, in reraise
    raise value
  File "/usr/lib/python3.10/site-packages/ipapython/install/util.py", line 59, in run_generator_with_yield_from
    value = gen.send(prev_value)
  File "/usr/lib/python3.10/site-packages/ipapython/install/common.py", line 65, in _install
    for unused in self._installer(self.parent):
  File "/usr/lib/python3.10/site-packages/ipaserver/install/server/__init__.py", line 573, in main
    master_install_check(self)
  File "/usr/lib/python3.10/site-packages/ipaserver/install/server/install.py", line 275, in decorated
    func(installer)
  File "/usr/lib/python3.10/site-packages/ipaserver/install/server/install.py", line 350, in install_check
    installutils.check_available_memory(ca=options.setup_ca)
  File "/usr/lib/python3.10/site-packages/ipaserver/install/installutils.py", line 1103, in check_available_memory
    raise ScriptError(

2021-10-26T15:51:32Z DEBUG The ipa-server-install command failed, exception: ScriptError: Unable to determine the amount of available RAM
2021-10-26T15:51:32Z ERROR Unable to determine the amount of available RAM
2021-10-26T15:51:32Z ERROR The ipa-server-install command failed. See /var/log/ipaserver-install.log for more information

@rcritten
Copy link

Based on traceback it failed in the container detection part. The installer is looking for both /sys/fs/cgroup/memory.current and /sys/fs/cgroup/memory.max to exist.

@danryu
Copy link
Author

danryu commented Oct 26, 2021

@danryu Can you try just -v /sys/fs/cgroup:/sys/fs/cgroup:rw, without that --cgroupns=host?

And if it still fails, add --security-opt apparmor=unconfined?

@adelton
I tried both these, both giving the same error as previous error.

@abbra
Copy link
Contributor

abbra commented Oct 27, 2021

@danryu as @rcritten said, we expect that for cgroups v2 memory controller is present and accessible. You can get more details in the kernel's documentation: https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html#memory.
If your container runtime does not set up proper memory controller, this is not a good state for anything else in a container.
For example, on Fedora 34 with podman 3.4.0 I get this:

$ podman run -ti fedora:34 /bin/bash
[root@0b2d1f0ea91b /]# ls -la /sys/fs/cgroup/
total 0
drwxr-xr-x.  2 root   root   0 Oct 27 06:33 .
drwxr-xr-x. 10 nobody nobody 0 Oct 27 06:33 ..
-r--r--r--.  1 root   root   0 Oct 27 06:33 cgroup.controllers
-r--r--r--.  1 root   root   0 Oct 27 06:33 cgroup.events
-rw-r--r--.  1 root   root   0 Oct 27 06:33 cgroup.freeze
-rw-r--r--.  1 root   root   0 Oct 27 06:33 cgroup.max.depth
-rw-r--r--.  1 root   root   0 Oct 27 06:33 cgroup.max.descendants
-rw-r--r--.  1 root   root   0 Oct 27 06:33 cgroup.procs
-r--r--r--.  1 root   root   0 Oct 27 06:33 cgroup.stat
-rw-r--r--.  1 root   root   0 Oct 27 06:33 cgroup.subtree_control
-rw-r--r--.  1 root   root   0 Oct 27 06:33 cgroup.threads
-rw-r--r--.  1 root   root   0 Oct 27 06:33 cgroup.type
-rw-r--r--.  1 root   root   0 Oct 27 06:33 cpu.pressure
-r--r--r--.  1 root   root   0 Oct 27 06:33 cpu.stat
-rw-r--r--.  1 root   root   0 Oct 27 06:33 io.pressure
-r--r--r--.  1 root   root   0 Oct 27 06:33 memory.current
-r--r--r--.  1 root   root   0 Oct 27 06:33 memory.events
-r--r--r--.  1 root   root   0 Oct 27 06:33 memory.events.local
-rw-r--r--.  1 root   root   0 Oct 27 06:33 memory.high
-rw-r--r--.  1 root   root   0 Oct 27 06:33 memory.low
-rw-r--r--.  1 root   root   0 Oct 27 06:33 memory.max
-rw-r--r--.  1 root   root   0 Oct 27 06:33 memory.min
-r--r--r--.  1 root   root   0 Oct 27 06:33 memory.numa_stat
-rw-r--r--.  1 root   root   0 Oct 27 06:33 memory.oom.group
-rw-r--r--.  1 root   root   0 Oct 27 06:33 memory.pressure
-r--r--r--.  1 root   root   0 Oct 27 06:33 memory.stat
-r--r--r--.  1 root   root   0 Oct 27 06:33 memory.swap.current
-r--r--r--.  1 root   root   0 Oct 27 06:33 memory.swap.events
-rw-r--r--.  1 root   root   0 Oct 27 06:33 memory.swap.high
-rw-r--r--.  1 root   root   0 Oct 27 06:33 memory.swap.max
-r--r--r--.  1 root   root   0 Oct 27 06:33 pids.current
-r--r--r--.  1 root   root   0 Oct 27 06:33 pids.events
-rw-r--r--.  1 root   root   0 Oct 27 06:33 pids.max

But outside container I don't get those various memory controller states because that's a property of the root cgroup by defition.

$ ls -la /sys/fs/cgroup/
total 0
dr-xr-xr-x. 12 root root 0 30. 8. 10:16 .
drwxr-xr-x. 10 root root 0 30. 8. 10:16 ..
-r--r--r--.  1 root root 0 30. 8. 10:16 cgroup.controllers
-rw-r--r--.  1 root root 0 27.10. 09:32 cgroup.max.depth
-rw-r--r--.  1 root root 0 27.10. 09:32 cgroup.max.descendants
-rw-r--r--.  1 root root 0 30. 8. 10:16 cgroup.procs
-r--r--r--.  1 root root 0 27.10. 09:32 cgroup.stat
-rw-r--r--.  1 root root 0 11.10. 18:09 cgroup.subtree_control
-rw-r--r--.  1 root root 0 27.10. 09:32 cgroup.threads
-rw-r--r--.  1 root root 0 27.10. 09:32 cpu.pressure
-r--r--r--.  1 root root 0 27.10. 09:32 cpuset.cpus.effective
-r--r--r--.  1 root root 0 27.10. 09:32 cpuset.mems.effective
-r--r--r--.  1 root root 0 27.10. 09:32 cpu.stat
drwxr-xr-x.  2 root root 0 11.10. 18:09 dev-hugepages.mount
drwxr-xr-x.  2 root root 0 11.10. 18:09 dev-mqueue.mount
drwxr-xr-x.  2 root root 0 30. 8. 10:16 init.scope
-rw-r--r--.  1 root root 0 27.10. 09:32 io.cost.model
-rw-r--r--.  1 root root 0 27.10. 09:32 io.cost.qos
-rw-r--r--.  1 root root 0 27.10. 09:32 io.pressure
-r--r--r--.  1 root root 0 27.10. 09:32 io.stat
drwxr-xr-x.  2 root root 0 11.10. 18:09 machine.slice
-r--r--r--.  1 root root 0 27.10. 09:32 memory.numa_stat
-rw-r--r--.  1 root root 0 27.10. 09:32 memory.pressure
-r--r--r--.  1 root root 0 27.10. 09:32 memory.stat
-r--r--r--.  1 root root 0 27.10. 09:32 misc.capacity
drwxr-xr-x.  2 root root 0 11.10. 18:09 sys-fs-fuse-connections.mount
drwxr-xr-x.  2 root root 0 11.10. 18:09 sys-kernel-config.mount
drwxr-xr-x.  2 root root 0 11.10. 18:09 sys-kernel-debug.mount
drwxr-xr-x.  2 root root 0 11.10. 18:09 sys-kernel-tracing.mount
drwxr-xr-x. 86 root root 0 27.10. 09:32 system.slice
drwxr-xr-x.  3 root root 0 11.10. 18:09 user.slice

To me it looks like your container runtime is not really configuring cgroup v2 memory controller. What container runtime do you really use?

@danryu
Copy link
Author

danryu commented Oct 27, 2021

@abbra

I'm running vanilla Ubuntu 21.10 x64, with the following vanilla docker installation:

ii  docker-ce                                     5:20.10.10~3-0~ubuntu-hirsute                amd64        Docker: the open-source application container engine
ii  docker-ce-cli                                 5:20.10.10~3-0~ubuntu-hirsute                amd64        Docker CLI: the open-source application container engine
ii  docker-ce-rootless-extras                     5:20.10.10~3-0~ubuntu-hirsute                amd64        Rootless support for Docker.
$ docker version
Client: Docker Engine - Community
 Version:           20.10.10
 API version:       1.41
 Go version:        go1.16.9
 Git commit:        b485636
 Built:             Mon Oct 25 07:43:13 2021
 OS/Arch:           linux/amd64
 Context:           default
 Experimental:      true

Server: Docker Engine - Community
 Engine:
  Version:          20.10.10
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.16.9
  Git commit:       e2f740d
  Built:            Mon Oct 25 07:41:20 2021
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.4.11
  GitCommit:        5b46e404f6b9f661a205e28d59c982d3634148f8
 runc:
  Version:          1.0.2
  GitCommit:        v1.0.2-0-g52b36a2
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

I haven't done any configuration of the docker runtime installation.
Also looking to run on Debian11 as host, as I mentioned.

@abbra
Copy link
Contributor

abbra commented Oct 27, 2021

Ok, you need to ensure that your kernel runs with unified cgroups hierarchy (systemd.unified_cgroup_hierarchy=1 in the kernel cmdline) and that you are using cgroup-based memory limits with docker. I find that Rootless containers site has a great starting point to configure all that for various container runtimes. It focuses on rootless containers which pretty much require presence of cgroup v2 but it should be similar for privileged docker runs too.

@danryu
Copy link
Author

danryu commented Oct 27, 2021

This feels quite bleeding edge! My expectation was that running freeipa-container would be similar to the procedure outlined in the docs. It's not yet clear to me why the experience between CentOS and Debian11 as a host OS should have such different setup requirements. It seems that the cgroups v2 issue causes many more issues than anticipated.
The current project I'm involved in needs FreeIPA running in k8s, in production - or we have to move to a different solution.

EDIT: FreeIPA is the current project's desired solution. If we are able to mutually find a solid configuration for Debian11/cgroups_v2, that would be optimal.
@abbra thanks for your suggestions, I hope we are closer to the target ...

@abbra
Copy link
Contributor

abbra commented Oct 27, 2021

I think you probably need to re-adjust your project's expectations.

There is currently no support beyond this forum for freeipa-container whatsoever, regardless where you are running it. That support is mostly to let you either figure out whether a problem you see is reproducible in a non-containerized deployment of FreeIPA (and thus can be filed as a bug for FreeIPA and its components) or a bug/misconfiguration in your container runtime. RHEL IdM -- as a supported solution -- currently is not supported to run in containers in RHEL 8. RHEL 7 version of IPA container is in a limited support scope. This is mostly due to state of containerization with systemd being in active development -- cgroup v2 is one feature that was absolutely required to enable secure container separation for systemd workloads, for example. You can run FreeIPA container on cgroup v1 in a traditional root-based setup but multiple of those container instances would be technically able to step over each other if broken because containers != isolation on the host.

cgroup v2 support is a kernel level feature. If your k8s deployment wants to use one, you need to be prepared to enable it everywhere where it is required (and that is not only in the kernel).

Others are using freeipa-container with Docker and other container runtimes but we typically hear mostly about issues with initial runtime configuration, like yours. Once they have figured out this step, we rarely hear their experience beyond 'now it works'.

@danryu
Copy link
Author

danryu commented Oct 27, 2021

Thanks @abbra for the clarification on support levels across versions. It's very helpful to have your insight into the state of play across these various components, as it is clear that there are multiple moving targets, making it difficult for a relative newcomer to figure out exact requirements.

With that in mind, it does seem that the configuration is very close - I'll feed back with further results.

@danryu
Copy link
Author

danryu commented Oct 27, 2021

FreeIPA container works just fine with cgroups v2 if your system actually provides cgroups v2. I have no problems with Fedora 34+ and podman, with both root and rootless runs.

As far as I know, nobody has replicated this so far on Debian11-based systems.

The steps below are the configuration I applied to a Debian11/Ubuntu21.10 OS in order to get freeipa-container to work.

Unfortunately, after applying all the config below, I still receive the same error from the generic startup command:

$ docker run --name freeipa-server-container  -ti  -h ipa.example.test --read-only   -v /sys/fs/cgroup:/sys/fs/cgroup:ro   -v /var/lib/ipa-data:/data:Z  freeipa/freeipa-server:fedora-rawhide-4.9.7 
systemd v248.9-1.fc34 running in system mode. (+PAM +AUDIT +SELINUX -APPARMOR +IMA +SMACK +SECCOMP +GCRYPT +GNUTLS +OPENSSL +ACL +BLKID +CURL +ELFUTILS +FIDO2 +IDN2 -IDN +IPTC +KMOD +LIBCRYPTSETUP +LIBFDISK +PCRE2 +PWQUALITY +P11KIT +QRENCODE +BZIP2 +LZ4 +XZ +ZLIB +ZSTD +XKBCOMMON +UTMP +SYSVINIT default-hierarchy=unified)
Detected virtualization docker.
Detected architecture x86-64.
Initializing machine ID from random generator.
Failed to create /init.scope control group: Read-only file system
Failed to allocate manager object: Read-only file system
[!!!!!!] Failed to allocate manager object.
Exiting PID 1...

Configuration

I followed all preparatory steps at: https://rootlesscontaine.rs/getting-started/common/

  • checking $XDG_RUNTIME_DIR set correctly
  • Enable dbus user session (see below)
  • Configure sysctl (not needed for Debian11 according to docs)
  • check /etc/subuid and /etc/subgid, sudo apt-get install -y uidmap
  • cgroup v2 setup: set kernel cmdline parameter: systemd.unified_cgroup_hierarchy=1
$ cat /proc/cmdline 
BOOT_IMAGE=/boot/vmlinuz-5.14.14-xanmod2 root=UUID=3d234b83-5e3a-4eff-a384-8e17575d4031 ro systemd.unified_cgroup_hierarchy=1 quiet splash pci=noaer vt.handoff=7

Possible issue with dbus, described as "typically needed for using systemd and cgroup v2"....
https://rootlesscontaine.rs/getting-started/common/login/#optional-enable-dbus-user-session

$ systemctl --user is-active dbus
active
ii  dbus-user-session                             1.12.20-2ubuntu2                             amd64        simple interprocess messaging system (systemd --user integration)
$ ls -al /run/user/1000/bus 
srw-rw-rw- 1 dan dan 0 Oct 27 13:07 /run/user/1000/bus
$ systemctl --user enable --now dbus
The unit files have no installation config (WantedBy=, RequiredBy=, Also=,
Alias= settings in the [Install] section, and DefaultInstance= for template
units). This means they are not meant to be enabled using systemctl.
 
Possible reasons for having this kind of units are:
• A unit may be statically enabled by being symlinked from another unit's
  .wants/ or .requires/ directory.
• A unit's purpose may be to act as a helper for some other unit which has
  a requirement dependency on it.
• A unit may be started when needed via activation (socket, path, timer,
  D-Bus, udev, scripted systemctl call, ...).
• In case of template units, the unit is meant to be enabled with some
  instance name specified.

@adelton
Copy link
Collaborator

adelton commented Oct 27, 2021

I'd really recommend you start with Ubuntu-based container image and just attempt to run systemd in that container via ENTRYPOINT. No FreeIPA containers for a start. That way you know you don't have some conflicting technologies in the setup, and you should also be able to get help from the Ubuntu folks.

Only once you get that working it will make sense to add Fedora and FreeIPA on Fedora to the mix. Going with rawhide is then the last step.

@danryu
Copy link
Author

danryu commented Oct 27, 2021

@adelton Following your advice I have been testing using the Dockerfile definitions at
https://github.com/AkihiroSuda/containerized-systemd

I have obtained similar results between the debian10, ubuntu-20.04, and fedora-33 images.

Showing below the build and run with fedora-33.
As you can see, systemd does run, although with failures. systemctl status shows state: initializing.

$ git clone git@github.com:AkihiroSuda/containerized-systemd.git
$ cd containerized-systemd
$ docker build -t akirofedora33 -f Dockerfile.fedora-33 .
Sending build context to Docker daemon  110.1kB
Step 1/4 : FROM fedora:33
 ---> a5465556eeb2
Step 2/4 : RUN dnf install -y systemd &&   rm -rf /tmp/*
 ---> Using cache
 ---> 02ce62b2cc2c
Step 3/4 : COPY docker-entrypoint.sh /
 ---> Using cache
 ---> 7d04ea1a66f0
Step 4/4 : ENTRYPOINT ["/docker-entrypoint.sh"]
 ---> Using cache
 ---> 49f99b633847
Successfully built 49f99b633847
Successfully tagged akirofedora33:latest

$ docker run -it  --rm --privileged --workdir /usr -e FOO=hello akirofedora33 /bin/bash
Created symlink /etc/systemd/system/systemd-firstboot.service → /dev/null.
Created symlink /etc/systemd/system/systemd-udevd.service → /dev/null.
Created symlink /etc/systemd/system/systemd-modules-load.service → /dev/null.
Created symlink /etc/systemd/system/multi-user.target.wants/docker-entrypoint.service → /etc/systemd/system/docker-entrypoint.service.
/docker-entrypoint.sh: starting /lib/systemd/systemd --show-status=false --unit=docker-entrypoint.target
systemd v246.15-1.fc33 running in system mode. (+PAM +AUDIT +SELINUX +IMA -APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +ZSTD +SECCOMP +BLKID +ELFUTILS +KMOD +IDN2 -IDN +PCRE2 default-hierarchy=unified)
Detected virtualization docker.
Detected architecture x86-64.
Set hostname to <8b8babb041d7>.
systemd-journald-audit.socket: Failed to create listening socket (audit 1): Operation not permitted
systemd-journald-audit.socket: Failed to listen on sockets: Operation not permitted
systemd-journald-audit.socket: Failed with result 'resources'.
sys-kernel-debug.mount: Mount process exited, code=exited, status=32/n/a
sys-kernel-debug.mount: Failed with result 'exit-code'.
modprobe@configfs.service: Succeeded.
modprobe@drm.service: Succeeded.
modprobe@fuse.service: Succeeded.
sys-kernel-config.mount: Mount process exited, code=exited, status=32/n/a
sys-kernel-config.mount: Failed with result 'exit-code'.
+ source /etc/docker-entrypoint-cmd
++ /bin/bash
[root@8b8babb041d7 usr]# systemctl status
● 8b8babb041d7
    State: initializing
     Jobs: 2 queued
   Failed: 5 units
    Since: Wed 2021-10-27 14:54:12 UTC; 10s ago
   CGroup: /
           ├─init.scope 
           │ └─1 /lib/systemd/systemd --show-status=false --unit=docker-entrypoint.target
           └─system.slice 
             ├─systemd-journald.service 
             │ └─29 /usr/lib/systemd/systemd-journald
             ├─docker-entrypoint.service 
             │ ├─40 /bin/bash -exc source /etc/docker-entrypoint-cmd
             │ ├─43 /bin/bash
             │ ├─69 systemctl status
             │ └─70 (pager)
             └─systemd-logind.service 
               └─41 /usr/lib/systemd/systemd-logind

@adelton
Copy link
Collaborator

adelton commented Oct 27, 2021

Please retry without --privileged. Using --privileged does not really tell us anything.

@danryu
Copy link
Author

danryu commented Oct 27, 2021

Please retry without --privileged. Using --privileged does not really tell us anything.

Yes, without --privileged, nothing runs.

$ docker run -it  --rm  --workdir /usr -e FOO=hello akirofedora33 /bin/bash
Created symlink /etc/systemd/system/systemd-firstboot.service → /dev/null.
Created symlink /etc/systemd/system/systemd-udevd.service → /dev/null.
Created symlink /etc/systemd/system/systemd-modules-load.service → /dev/null.
Created symlink /etc/systemd/system/multi-user.target.wants/docker-entrypoint.service → /etc/systemd/system/docker-entrypoint.service.
/docker-entrypoint.sh: starting /lib/systemd/systemd --show-status=false --unit=docker-entrypoint.target
Failed to mount tmpfs at /run: Operation not permitted
[!!!!!!] Failed to mount API filesystems.
Exiting PID 1...

@danryu
Copy link
Author

danryu commented Oct 27, 2021

FreeIPA container works just fine with cgroups v2 if your system actually provides cgroups v2. I have no problems with Fedora 34+ and podman, with both root and rootless runs.

I have confirmed this now myself, and have also got the same results on Ubuntu21.10 with podman, but not docker.

I can't get the Docker runtime to run any kind of systemd/cgroups_v2 combination, on Fedora/Debian/Ubuntu etc.

In each case, the test setup is:

I tested:

- Fedora 34 host OS
- podman runtime
- cgroups v2
=> WORKS
- Debian11/Ubuntu21.10 host OS
- podman runtime
- cgroups v2
=> WORKS
- Fedora 34 host OS
- Docker runtime
- cgroups v2
=> FAILS
- Debian11/Ubuntu21.10 host OS
- Docker runtime
- cgroups v2
=> FAILS

@danryu
Copy link
Author

danryu commented Oct 28, 2021

@abbra Thanks to you and the other devs for the tips.

Regarding your comments about container runtimes, and in light of the tests between docker/podman above, is there anything to be done, other than log an upstream bug in Docker runtime (not sure this is an option)?
(Perhaps the freeipa-container README notes could reflect likely issues with Docker runtime and cgroups v2 systems?)

As far as I can ascertain, the Docker runtime is not playing nice between cgroups v2 and systemd, whereas other runtimes have no issues.
I haven't been able to find anything thus far in their issues, only references in downstream projects.

Note: Running containerd in Kubernetes, rather than the Docker runtime, is likely to circumvent this problem - tbc

@natevw
Copy link

natevw commented Oct 28, 2021

I was wrestling with this same problem (Debian Bullseye platform, freeipa/freeipa-server:fedora-34 container) recently. I don't know if I have a lot to contribute since frankly this discussion is better than most of the threads I had found but here's some related links from my notes and bookmarks:

What worked for me was:

vi /etc/default/grub # via https://askubuntu.com/questions/19486/how-do-i-add-a-kernel-boot-parameter/19487#19487
    GRUB_CMDLINE_LINUX="console=tty0 console=ttyS0,115200 earlyprintk=ttyS0,115200 consoleblank=0 systemd.unified_cgroup_hierarchy=0"
update-grub

i.e. add systemd.unified_cgroup_hierarchy=0 to kernel arguments (n.b. the opposite recommended #429 (comment))

and then running with e.g.:

docker run --name freeipa-server-container -ti \
  -h test-vm.local --read-only \
  --sysctl net.ipv6.conf.all.disable_ipv6=0 \
  -v /sys/fs/cgroup:/sys/fs/cgroup:ro \
  -v /srv/data/freeipa:/data:Z \
  -p 123:123/udp -p 389:389/tcp -p 443:443/tcp -p 8443:8443/tcp \
  -p 464:464/tcp -p 464:464/udp -p 53:53/tcp -p 53:53/udp \
  -p 636:636/tcp -p 80:80/tcp -p 88:88/tcp -p 88:88/udp \
  freeipa-server ipa-server-install \
  --hostname=test-vm.local \
  --domain=fakesubdomain.mydomain.io \
  --realm=FAKESUBDOMAIN.MYDOMAIN.IO \
  --ds-password=letmein \
  --admin-password=letmein \
  --no-ntp

i.e. --read-only and not --privileged, and yes including -v /sys/fs/cgroup:/sys/fs/cgroup:ro. I didn't try a great deal of combinations but iirc --privileged did "work" as an alternative to changing the kernel argument.

This is all with a Debian guest VM running on a Debian host, using the Debian-packaged version of apt install docker.io — I can provide more exact versions of everything if it would be relevant/helpful but my impression is there's already quite a bit of insight as to what's going on here. Just no link to be found anywhere online yet to what I imagine should be be a Docker bug report regarding this?

@adelton
Copy link
Collaborator

adelton commented Oct 28, 2021

Using systemd.unified_cgroup_hierarchy=0 forces cgroups v1. In this issue @danryu tries to get systemd (and FreeIPA) working on a host with cgroups v2, without reverting to v1.

@danryu danryu changed the title cgroups v2 support - on the horizon?? cgroups v2 support - possible with systemd/freeipa-container ? Oct 29, 2021
@adelton
Copy link
Collaborator

adelton commented Feb 1, 2022

I assume the general conclusion is, since it works with podman and does not with docker, there's still something docker needs to do to make cgroups v2 usable in containers. There's likely not much we can do to help from FreeIPA containerization point of view, so I'm closing this issue.

Feel free to add information about new developments, especially if docker starts working in some version.

@adelton adelton closed this as completed Feb 1, 2022
@natevw
Copy link

natevw commented Feb 3, 2022

Seems reasonable, although for completeness it'd be nice to link to the underlying Docker issue ticket for tracking. (is it opencontainers/runc#2315? or moby/moby#42910? IDK what the currently relevant internal component is or precisely what the problem freeipa hits is….)

@adelton
Copy link
Collaborator

adelton commented Feb 3, 2022

There is nothing FreeIPA specific there, it all stems from running systemd in the container (with v2 cgroups).

@Westie
Copy link

Westie commented Jul 24, 2022

docker/for-mac#6288 is an upstream report of this issue

@adelton
Copy link
Collaborator

adelton commented Aug 16, 2022

For the record, on Ubuntu 22.04 which comes with cgroups v2 by default, I was able to get the container running with docker.io when I enabled user namespace remapping, with no explicit /sys/fs/cgroup mounts. I've updated https://github.com/freeipa/freeipa-container#running-freeipa-server-container and also enabled that setup in https://github.com/freeipa/freeipa-container/blob/master/.github/workflows/build-test.yaml. This approach also works with K3s on Ubuntu 22.04, using cri-dockerd and docker.io.

@Vlad1mir-D
Copy link

I have discovered two workarounds for this issue that effectively retain all features of unified cgroupv2 while maintaining security - no need for the --privileged flag and no access to the root of cgroupv2 hierarchy:

  1. Use the --cgroupns host Docker option and a cgroupv2 sub-hierarchy volume binding for the container. Here is an example command:
# docker run --rm --name freeipa -it --read-only --security-opt seccomp=unconfined --hostname freeipa.corp --init=false --cgroupns host -v /sys/fs/cgroup/freeipa.scope:/sys/fs/cgroup:rw freeipa/freeipa-server:almalinux-9
systemd 252-13.el9_2 running in system mode (+PAM +AUDIT +SELINUX -APPARMOR +IMA +SMACK +SECCOMP +GCRYPT +GNUTLS +OPENSSL +ACL +BLKID +CURL +ELFUTILS -FIDO2 +IDN2 -IDN -IPTC +KMOD +LIBCRYPTSETUP +LIBFDISK +PCRE2 -PWQUALITY +P11KIT -QRENCODE +TPM2 +BZIP2 +LZ4 +XZ +ZLIB +ZSTD -BPF_FRAMEWORK +XKBCOMMON +UTMP +SYSVINIT default-hierarchy=unified)
Detected virtualization container-other.
Detected architecture x86-64.
Initializing machine ID from random generator.
Queued start job for default target Minimal target for containerized FreeIPA server.
[..]

Not perfect, next option is better IMO.

  1. Mount /sys/fs/cgroup on the host without the nsdelegate mount option. Although there isn't an explicit option to disable nsdelegate like nodiscard for discard (see link 1, link 2 for more information), there is a workaround. Simply run any container using Docker with the --cgroupns host option and without any cgroup volume bindings. For example:
# grep cgroup /proc/mounts 
cgroup2 /sys/fs/cgroup cgroup2 rw,seclabel,nosuid,nodev,noexec,relatime,nsdelegate,memory_recursiveprot 0 0
# docker run --rm --cgroupns host ubuntu:latest echo done
done
# grep cgroup /proc/mounts 
cgroup2 /sys/fs/cgroup cgroup2 rw,seclabel,nosuid,nodev,noexec,relatime 0 0

After implementing these steps, you can run a container with Docker using --cgroupns private flag and volume binding of cgroupv2 sub-hierarchy. For example:

# docker run --rm --name freeipa -it --read-only --security-opt seccomp=unconfined --hostname freeipa.corp --init=false --cgroupns private -v /sys/fs/cgroup/freeipa.scope:/sys/fs/cgroup:rw freeipa/freeipa-server:almalinux-9
systemd 252-13.el9_2 running in system mode (+PAM +AUDIT +SELINUX -APPARMOR +IMA +SMACK +SECCOMP +GCRYPT +GNUTLS +OPENSSL +ACL +BLKID +CURL +ELFUTILS -FIDO2 +IDN2 -IDN -IPTC +KMOD +LIBCRYPTSETUP +LIBFDISK +PCRE2 -PWQUALITY +P11KIT -QRENCODE +TPM2 +BZIP2 +LZ4 +XZ +ZLIB +ZSTD -BPF_FRAMEWORK +XKBCOMMON +UTMP +SYSVINIT default-hierarchy=unified)
Detected virtualization container-other.
Detected architecture x86-64.
Initializing machine ID from random generator.
Queued start job for default target Minimal target for containerized FreeIPA server.
[..]

Please note that the information provided above applies specifically to CentOS Stream release 9 with kernel-ml-6.3.7-1.el9.elrepo, systemd-252.4-598.13.hs.el9 (Hyperscale SIG) and docker-ce-24.0.2-1 (systemd cgroup driver) although may help with a wide range of different scenarios.

@jackadam1981
Copy link

jackadam1981 commented Jul 16, 2023

- Fedora 34 host OS
- podman runtime
- cgroups v2
=> WORKS
- Debian11/Ubuntu21.10 host OS
- podman runtime
- cgroups v2
=> WORKS
- Fedora 34 host OS
- Docker runtime
- cgroups v2
=> FAILS
- Debian11/Ubuntu21.10 host OS
- Docker runtime
- cgroups v2
=> FAILS

It seems that podman podman composite is the current solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants