Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Odroid XU4 | Force legacy cgroup v1 usage #4705

Closed
simonfisher opened this issue Sep 2, 2021 · 17 comments
Closed

Odroid XU4 | Force legacy cgroup v1 usage #4705

simonfisher opened this issue Sep 2, 2021 · 17 comments
Labels
Bug 🐞 Kernel related 🧬 Odroid XU4 Workaround available 🆗 Workaround is available/has been implemented, but a definite solution should be found when possible.
Milestone

Comments

@simonfisher
Copy link

simonfisher commented Sep 2, 2021

Creating a bug report/issue

Required Information

  • DietPi version |
G_DIETPI_VERSION_CORE=7
G_DIETPI_VERSION_SUB=5
G_DIETPI_VERSION_RC=2
G_GITBRANCH='master'
G_GITOWNER='MichaIng'
G_LIVE_PATCH_STATUS[0]='not applicable'
  • Distro version | bullseye
  • Kernel version | Linux DietPi 4.14.241+ #1 SMP PREEMPT Wed Jul 28 16:55:16 UTC 2021 armv7l GNU/Linux
  • SBC model | Odroid XU3/XU4/MC1/HC1/HC2 (armv7l)
  • Power supply used | Odroid one
  • SDcard used | eMMC

Additional Information (if applicable)

  • Software title | Portainer
  • Was the software title installed freshly or updated/migrated?
    Installed on a prior DietPi install, then upgraded to bullseye, portainer wasn't working so attempted reinstall
  • Can this issue be replicated on a fresh installation of DietPi? Unsure
  • Bug report ID | 9cf73ff2-fbf5-4048-95e4-7f112c4f12c4

Steps to reproduce

  1. in Buster, install Portainer
  2. Upgrade from Buster to Bullseye using steps here: https://dietpi.com/blog/?p=811
  3. Realise can't access Portainer via http address
  4. Attempt reinstall of Portainer, using: dietpi-software reinstall 185

Expected behaviour

Working Portainer...

Actual behaviour

This crash log:

DietPi-Software
─────────────────────────────────────────────────────
Mode: Configuring Portainer: Simplifies container management in Docker (standalone host)

[ INFO ] DietPi-Software | Docker will be restarted to be able to deploy the container.
[  OK  ] DietPi-Software | systemctl daemon-reload
[  OK  ] DietPi-Software | systemctl restart docker
[  OK  ] DietPi-Software | docker rm -f 8c756efbe038
[  OK  ] DietPi-Software | docker rmi bd3c978cdaec
[ INFO ] DietPi-Software | Portainer will be deployed now. This could take a while...
[ INFO ] DietPi-Software | docker run -d -p 9002:9000 --name=portainer --restart=always -v /run/docker.sock:/var/run/docker.sock -v portainer_data:/data portainer/portainer-ce, please wait...
Unable to find image 'portainer/portainer-ce:latest' locally
latest: Pulling from portainer/portainer-ce
651a8e6e1630: Pulling fs layer
56e38df73332: Pulling fs layer
ffed15de3e09: Pulling fs layer
56e38df73332: Verifying Checksum
56e38df73332: Download complete
651a8e6e1630: Download complete
651a8e6e1630: Pull complete
56e38df73332: Pull complete
ffed15de3e09: Verifying Checksum
ffed15de3e09: Download complete
ffed15de3e09: Pull complete
Digest: sha256:8f077d2d1ba2e771ea8cc63af0a37d211f61354a0d094234f832e588ab571888
Status: Downloaded newer image for portainer/portainer-ce:latest
4d94617915bb9d2365a2b79383b69059a86d58321f34c8efe810d1bab93d7c5d
docker: Error response from daemon: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: process_linux.go:508: setting cgroup config for procHooks process caused: bpf_prog_query(BPF_CGROUP_DEVICE) failed: invalid argument: unknown.
[FAILED] DietPi-Software | docker run -d -p 9002:9000 --name=portainer --restart=always -v /run/docker.sock:/var/run/docker.sock -v portainer_data:/data portainer/portainer-ce

Extra details

  • ...
@simonfisher
Copy link
Author

To add: this issue maybe the same or similar to that described here: https://archlinuxarm.org/forum/viewtopic.php?f=15&t=15354

An incompatibility somewhere with 'cgroup'...

@Joulinar
Copy link
Collaborator

Joulinar commented Sep 3, 2021

I guess kernel version is to low according this GitHub post opencontainers/runc#2959 (comment)

It seem 4.15 is required at least to be able to use cgroup v2, But I guess there is no newer kernel than 4.14 for Odroid

@MichaIng
I guess we need your inside 😃

@MichaIng
Copy link
Owner

MichaIng commented Sep 3, 2021

Can you try to reinstall Docker to assure the packages from the Bullseye suite are installed:

apt install --reinstall docker-ce-cli containerd.io docker-ce

I do not believe yet that Docker now strictly requires Linux v4.15+ without any major version increment.


If Docker still won't work, and you are in mood for some backups and testing, we may push the Linux v5.4 bump for Odroid XU4 (and alike): #3861

Since I don't have it here, I cannot test it. Aside of installing the new kernel package, the /boot/boot.ini needs to be adjusted. Due to increased kernel size, at least the initrd and dtb RAM load addresses need to be moved forward I guess, to leave enough space for the new kernel. And some boot options may have become obsolete or changed names, which should be visible in the kernel logs (dmesg), but this shouldn't break boot. Last but not least, the new kernel comes with device tree overlays, so we might want to add support for those to /boot/boot.ini as well, so that one can e.g. edit some setenv overlays "..." line at the top of the file, to enable or configure additional SoC devices.

So if you have a monitor and keyboard, and either a spare SD card to clone the current one to, or at least backup the content of that FAT partition (which contains all files we'd going to change), I'd love to go through this with you. I'll also sent a mail to Meveric, to check back whether he already did some tests on his end, or otherwise want to follow or help to get his images updated as well.

@simonfisher
Copy link
Author

That command doesn't fully work as it seems...

root@DietPi:~# apt install --reinstall docker-ce-cli containerd.io docker-ce
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
Reinstallation of docker-ce is not possible, it cannot be downloaded.
Reinstallation of docker-ce-cli is not possible, it cannot be downloaded.
0 upgraded, 0 newly installed, 1 reinstalled, 0 to remove and 0 not upgraded.
Need to get 16.6 MB of archives.
After this operation, 0 B of additional disk space will be used.
Do you want to continue? [Y/n] Y
Get:1 https://download.docker.com/linux/debian bullseye/stable armhf containerd.io armhf 1.4.9-1 [16.6 MB]
Fetched 16.6 MB in 1s (12.4 MB/s)         
debconf: delaying package configuration, since apt-utils is not installed
(Reading database ... 43614 files and directories currently installed.)
Preparing to unpack .../containerd.io_1.4.9-1_armhf.deb ...
Unpacking containerd.io (1.4.9-1) over (1.4.9-1) ...
Setting up containerd.io (1.4.9-1) ...

This output above is from the second time I ran the command - the first time the output got refreshed by my silly iPad SSH client. So I'm not sure if containerd.io actually got updated. Either way, a reinstall of portainer gives the same error after this.

Are the reports that docker-ce and docker-ce-cli can't be downloaded indicative of a wider problem?

I dont think I'm in a position/have time at the moment to do detailed testing with multiple SDs, kernel changes, and I also can't easily hook up a monitor etc. sorry.

@simonfisher
Copy link
Author

I can add though, I tried the fix in the link I posted above (appending systemd.unified_cgroup_hierarchy=0 to the line in boot.ini), and that allowed the reinstall of portainer to complete without error, and docker/portainer is running fine now!

Are there any negative implications likely to this change?

@MichaIng
Copy link
Owner

MichaIng commented Sep 4, 2021

Strange error with the package download, I don't remember if I have seen this already, and no reason given either. The packages are definitely there: https://download.docker.com/linux/debian/dists/bullseye/pool/stable/armhf/

Great find with the cgroup downgrade, I missed that one. So looks like Bullseyes systemd by default uses a new cgroup implementation/version which is not supported by Linux v4.14 or older yet. So we need to add this at best directly to all our images (via next update) where we know that the kernel version is that low currently.

The setting is described here: https://manpages.debian.org/bullseye/systemd-sysv/init.1.en.html#KERNEL_COMMAND_LINE
There is even a precise check whether the kernel supports this feature or not, so no need to know or guess which kernel version is installed and/or whether the feature has been backported etc:

grep cgroup2 /proc/filesystems

We should then also set systemd.legacy_systemd_cgroup_controller=0 to force full legacy mode, else for systemd itself, cgroup2 would still be tried to use.
@simonfisher
Probably you can currently even see a related error message at boot, when systemd tries to mount the cgroup2 hierarchy, which should then fail:

dmesg -l emerg,alert,crit,err

And while it seems to be not used by default on Buster, the kernel command line parameter is understood already, so we can add it to our default boot config files without need to worry that users face boot warnings when creating Buster images via PREP.

@MichaIng MichaIng added this to the v7.7 milestone Sep 18, 2021
@MichaIng MichaIng changed the title Portainer install fails on Bullseye Odroid XU4 | Force legacy cgroup v1 usage Sep 18, 2021
@G2G2G2G
Copy link

G2G2G2G commented Sep 30, 2021

It seem 4.15 is required at least to be able to use cgroup v2, But I guess there is no newer kernel than 4.14 for Odroid

for what it's worth I've ran the 5.3.11 kernel for years on 3 HC2 devices just fine,
Linux d2 5.3.11-odroidxu4 #5.99.191113 SMP PREEMPT Wed Nov 13 08:51:20 CET 2019 armv7l GNU/Linux

and this has 5.4 kernel https://www.armbian.com/odroid-hc1/ (as Michalng mentioned)

If Docker still won't work, and you are in mood for some backups and testing, we may push the Linux v5.4 bump for Odroid XU4 (and alike): #3861

I can test with 1 of my 3 devices in many user production environment (they only use samba & nfs) is there an image available?

Armbian offers:

This board is stripped Odroid XU4 and we use the same images, however, we provide a specially optimized config (for kernel 4.14.y or higher) which has to be applied manually. This results in shorter boot time and lower consumption. Run armbian-config utility and go to section system -> DTB and select optimized board configuration for Odroid HC1. The same config is valid for HC2 and MC1.

I've never used that though just noticed it.. years too late lol

@MichaIng MichaIng added the Workaround available 🆗 Workaround is available/has been implemented, but a definite solution should be found when possible. label Oct 16, 2021
@MichaIng
Copy link
Owner

The kernel command line arguments are applied on next update on future images, at least for devices where we know how to do it: 3724290

E.g. on NanoPi M2 there is no boot configuration file, so I'm no sure how to achieve it there. In those cases users are informed about the issue. It may leave the question open about how to fix it, but it is probably better to get issues about that question that such with "Docker fails to install".

@MichaIng
Copy link
Owner

Wrong value to disable hybrid mode: 73ecb61

Dammit, either the cgroupsv2 capability test is not precise or it alone is not the only issue. I installed Linux 4.9 on a Bullseye system, and even that it is <4.15, is shows:

# grep cgroup2 /proc/filesystems
nodev   cgroup2

Installing Docker works, but installing a container (Portainer tested) fails:

docker: Error response from daemon: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: process_linux.go:508: setting cgroup config for procHooks process caused: bpf_prog_query(BPF_CGROUP_DEVICE) failed: invalid argument: unknown.
...
level=error msg="Handler for POST /v1.41/containers/32737e8533cff0b30d948a1e7fadfd06cb0615e8b6e256b091496ef4dc1d115f/start returned error: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: process_linux.go:508: setting cgroup config for procHooks process caused: bpf_prog_query(BPF_CGROUP_DEVICE) failed: invalid argument: unknown"

I applied the kernel command line arguments and retried:

# grep 'cgroup2' /proc/filesystems
nodev   cgroup2

But Portainer installs and runs fine. Removing the arguments breaks Portainer startup again. Even on a Debian Stretch system (where that kernel comes from) the above shows this output, even that the kernel natively does not support it, but Portainer still installs and runs fine.

So generally the cgroup2 mount type is available on all (x86_64) kernel versions down til Stretch, and it can be mounted successfully. Whether or not it is mounted however depends on the Debian/systemd version:

Bullseye:

# df -aT | grep cgroup
cgroup2        cgroup2             0       0         0    - /sys/fs/cgroup
  • Only cgroup2 is mounted by default.

Buster

# df -aT | grep cgroup
df: /proc/sys/fs/binfmt_misc: No such device
tmpfs          tmpfs        1021348      0   1021348   0% /sys/fs/cgroup
cgroup2        cgroup2            0      0         0    - /sys/fs/cgroup/unified
cgroup         cgroup             0      0         0    - /sys/fs/cgroup/systemd
cgroup         cgroup             0      0         0    - /sys/fs/cgroup/pids
cgroup         cgroup             0      0         0    - /sys/fs/cgroup/cpu,cpuacct
cgroup         cgroup             0      0         0    - /sys/fs/cgroup/freezer
cgroup         cgroup             0      0         0    - /sys/fs/cgroup/devices
cgroup         cgroup             0      0         0    - /sys/fs/cgroup/net_cls,net_prio
cgroup         cgroup             0      0         0    - /sys/fs/cgroup/memory
cgroup         cgroup             0      0         0    - /sys/fs/cgroup/rdma
cgroup         cgroup             0      0         0    - /sys/fs/cgroup/blkio
cgroup         cgroup             0      0         0    - /sys/fs/cgroup/perf_event
cgroup         cgroup             0      0         0    - /sys/fs/cgroup/cpuset
  • cgroupsv1 is mounted to the default mount point, but v2 to the dedicated /sys/fs/cgroup/unified, which looks like this hybrid thing.

Stretch

# df -aT | grep cgroup
tmpfs          tmpfs         1026040       0   1026040   0% /sys/fs/cgroup
cgroup         cgroup              0       0         0    - /sys/fs/cgroup/systemd
cgroup         cgroup              0       0         0    - /sys/fs/cgroup/blkio
cgroup         cgroup              0       0         0    - /sys/fs/cgroup/devices
cgroup         cgroup              0       0         0    - /sys/fs/cgroup/memory
cgroup         cgroup              0       0         0    - /sys/fs/cgroup/cpuset
cgroup         cgroup              0       0         0    - /sys/fs/cgroup/net_cls,net_prio
cgroup         cgroup              0       0         0    - /sys/fs/cgroup/pids
cgroup         cgroup              0       0         0    - /sys/fs/cgroup/cpu,cpuacct
cgroup         cgroup              0       0         0    - /sys/fs/cgroup/perf_event
cgroup         cgroup              0       0         0    - /sys/fs/cgroup/freezer
  • Only v1 is mounted.

Now I guess those kernel command line arguments control whether systemd mounts v1, v2 or hybrid. And indeed:

  • systemd.unified_cgroup_hierarchy=0 disables the cgroup2 mount at /sys/fs/cgroup, which is the default until Bullseye.
  • systemd.legacy_systemd_cgroup_controller(=1) disables the hybrid mode cgroup2 mount at /sys/fs/cgroup/unified, which is default only until Buster. This has of course no effect if the cgroup2 mount at /sys/fs/cgroup is enabled.
  • Otherwise on all Debian versions this can be changed freely and the mounts do not fail. But obviously cgroupsv2 cannot be successfully used with the v4.9 kernel, if fails on all Debian versions when manually installing that one.
  • So the question is, how can be know whether the kernel supports is or not 🤔.

@MichaIng
Copy link
Owner

MichaIng commented Oct 16, 2021

Okay, I give up on this. It is not general cgroups v2 support, but cgroups BPF support to manage device access permissions, which is usually missing on Linux 4.14 and earlier. But I couldn't find a way to reliably test whether this is available or not, especially since the kernel build config is not always available.

@simonfisher @G2G2G2G @davindisko
Just to assure, could you paste the output of the following command:

grep cgroup2 /proc/filesystems

And @G2G2G2G as you use Linux v5.3, does the bootloader still use /boot/boot.ini or is it /boot/boot.cmd resp. /boot/boot.scr already? If it remains true that the above command is no indicator (pretty sure), then I may just apply the workaround to all /boot/boot.ini files when v5+ Linux uses a bootloader with the new configuration files anyway. If it uses the same, they I'll do a Linux version check as well (while this is also not 100% reliable and features can be backported).


I answered the boot.ini question myself: Armbian's Linux 5.4 image uses it as well. So kernel version check will be added to only apply the workaround on Linux 4.14 and below.

@G2G2G2G
Copy link

G2G2G2G commented Oct 16, 2021

The command:

grep cgroup2 /proc/filesystems 
nodev	cgroup2

Basically what you already had in the above post

Dammit, either the cgroupsv2 capability test is not precise or it alone is not the only issue. I installed Linux 4.9 on a Bullseye >system, and even that it is <4.15, is shows:

#grep cgroup2 /proc/filesystems
nodev cgroup2

Keep in mind my system is some 5.3 kernel that was only on armbian for a few months and I never reimaged anything.. not sure about their 5.4 kernel, if you have a dietpi or want me to run armbian's I can reimage it

You answered the other one yourself? (If not let me know how to check the bootloader.. I am only familiar with /etc/defaults/grub which is not on these SBCs

Here's this if it means anything

ls -thal /boot/
total 24M
drwxr-xr-x  3 root root 4.0K Aug 25 20:31 ./
lrwxrwxrwx  1 root root   24 Aug 25 20:31 uInitrd -> uInitrd-5.3.11-odroidxu4
-rw-r--r--  1 root root 7.3M Aug 25 20:31 uInitrd-5.3.11-odroidxu4
-rw-r--r--  1 root root 7.3M Aug 25 20:31 initrd.img-5.3.11-odroidxu4
drwxr-xr-x 23 root root 4.0K Nov 21  2019 ../
-rw-r--r--  1 root root  13K Nov 13  2019 boot.ini
-rw-r--r--  1 root root 1.5K Nov 13  2019 armbian_first_run.txt.template
-rw-r--r--  1 root root 4.8K Nov 13  2019 boot-desktop.png
-rw-r--r--  1 root root  38K Nov 13  2019 boot.bmp
lrwxrwxrwx  1 root root   20 Nov 13  2019 dtb -> dtb-5.3.11-odroidxu4/
drwxr-xr-x  2 root root 4.0K Nov 13  2019 dtb-5.3.11-odroidxu4/
-rw-r--r--  1 root root    0 Nov 13  2019 .next
lrwxrwxrwx  1 root root   24 Nov 13  2019 zImage -> vmlinuz-5.3.11-odroidxu4*
-rw-r--r--  1 root root 161K Nov 13  2019 config-5.3.11-odroidxu4
-rw-r--r--  1 root root 2.9M Nov 13  2019 System.map-5.3.11-odroidxu4
-rwxr-xr-x  1 root root 5.9M Nov 13  2019 vmlinuz-5.3.11-odroidxu4*

Is there any dietpi 5.4 kernel IMAGE file I can run on this? I'd like to dd it onto an sd card and give it a go

@MichaIng
Copy link
Owner

Jep, so as well a boot.ini in your case. It checks now for the kernel version before applying the cgroupv1 flag, so should be fine.

@G2G2G2G
Copy link

G2G2G2G commented Oct 17, 2021

@MichaIng ohh I figured it out, I was asking for a new image a few times.. I always thought you guys used armbian kernels on some of your distros but it's only debian.. (I'm slow)

so I just have to take the current image and upgrade to bullseye.
I'll do that tonight

@MichaIng
Copy link
Owner

MichaIng commented Oct 18, 2021

Armbian is also only Debian 😉. For Odroids we use Meveric's Odroid repository for kernel and some GPU accelerated libraries/packages: https://dietpi.com/meveric/
For most other SBCs (not RPi) we use Armbian's repository for kernel and firmware. But both have the Debian repository for 99.9% for available packages.

@G2G2G2G
Copy link

G2G2G2G commented Oct 19, 2021

I know it's debian. Was just strictly kernels..that's why I said I'd test a 5.4 one (armbian's kernel)
I thought bullseye would bring me to 5.10 kernel but after updating yesterday I realized it is still for whatever reason 4.14 kernel.
I followed this https://dietpi.com/blog/?p=811
On my other board it updated the kernel to 5.10 so that's where the confusion would be (other board is NOT odroid)

Is this coming soon? Looks like they added kernel 5.4 last year?
https://forum.odroid.com/viewtopic.php?t=39891

https://forum.odroid.com/viewtopic.php?p=303023#p303023

Is there a way I can update to it via apt?

@MichaIng
Copy link
Owner

Only on x86_64, the Debian kernel is used, in all other cases a different kernel package is used which is completely independent from the underlying Debian version.

You could update the kernel package, but you also need to update the bootloader (U-Boot) and the boot configuration /boot/boot.ini, respectively create a matching /boot/boot.cmd and compile it into /boot/boot.scr, otherwise the device won't boot. PR is pending: #3861
It's either me or Meveric who need to finish testing and creating configurations and a U-Boot package for this upgrade to be done, or someone else who is able to do so.

@G2G2G2G
Copy link

G2G2G2G commented Oct 20, 2021

ohhh ok
from the code I see it's fairly straight forward for apt install linux-image-5.4-armhf-odroid-xu4 linux-headers-5.4-armhf-odroid-xu4 but no idea about generating a boot.cmd and boot.scr =[

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug 🐞 Kernel related 🧬 Odroid XU4 Workaround available 🆗 Workaround is available/has been implemented, but a definite solution should be found when possible.
Projects
None yet
Development

No branches or pull requests

4 participants