Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support k3s on LibreElec #4859

Closed
1 task
mossroy opened this issue Jan 2, 2022 · 25 comments
Closed
1 task

Support k3s on LibreElec #4859

mossroy opened this issue Jan 2, 2022 · 25 comments

Comments

@mossroy
Copy link

mossroy commented Jan 2, 2022

Is your feature request related to a problem? Please describe.

I'm trying to run a k3s agent on LibreElec (amd64), to join an existing k3s cluster.
Docker can easily be installed (there's a dedicated Kodi add-on provided by LibreElec), and works : I can run docker containers on the SSH command-line.
But this linux distribution uses unusual mounts. In particular :

/dev/loop0 on / type squashfs (ro,relatime)
/dev/sda2 on /storage type ext4 (rw,noatime)
tmpfs on /var type tmpfs (rw,relatime)

So /etc is read-only, and /var is not persistent after a reboot. But /storage could be used to store both the binary and the config.

With this config, k3s does not run out-of-the-box.

NB : Systemd is available and used by this distro

Describe the solution you'd like

Ideally, I could just run the installer (with specific options if necessary) and it would run (either as server or agent).

Installation without specific options currently fails because /usr/local does not exist :

# curl -sfL https://get.k3s.io | K3S_URL=https://my-control-plane:6443 K3S_TOKEN=... sh -s - --docker
touch: /usr/local/bin/k3s-ro-test: No such file or directory
[INFO]  Finding release for channel stable
[INFO]  Using v1.22.5+k3s1 as release
[INFO]  Downloading hash https://github.com/k3s-io/k3s/releases/download/v1.22.5+k3s1/sha256sum-amd64.txt
[INFO]  Downloading binary https://github.com/k3s-io/k3s/releases/download/v1.22.5+k3s1/k3s
[INFO]  Verifying binary download
[INFO]  Installing k3s to /usr/local/bin/k3s
mv: can't rename '/tmp/k3s-install.XXXXp2g4LV/k3s.bin': No such file or directory

If I set INSTALL_K3S_SYSTEMD_DIR=/storage/k3s-systemd INSTALL_K3S_BIN_DIR=/storage/k3s when installing, the installation works :

# curl -sfL https://get.k3s.io | INSTALL_K3S_SYSTEMD_DIR=/storage/k3s-systemd INSTALL_K3S_BIN_DIR=/storage/k3s K3S_URL=https://my-control-plane:6443 K3S_TOKEN=... sh -s - --docker
[INFO]  Finding release for channel stable
[INFO]  Using v1.22.5+k3s1 as release
[INFO]  Downloading hash https://github.com/k3s-io/k3s/releases/download/v1.22.5+k3s1/sha256sum-amd64.txt
[INFO]  Skipping binary downloaded, installed k3s matches hash
[INFO]  Skipping installation of SELinux RPM
[INFO]  Skipping /storage/k3s/kubectl symlink to k3s, already exists
[INFO]  Skipping /storage/k3s/crictl symlink to k3s, already exists
[INFO]  Skipping /storage/k3s/ctr symlink to k3s, already exists
[INFO]  Creating killall script /storage/k3s/k3s-killall.sh
[INFO]  Creating uninstall script /storage/k3s/k3s-agent-uninstall.sh
[INFO]  env: Creating environment file /storage/k3s-systemd/k3s-agent.service.env
[INFO]  systemd: Creating service file /storage/k3s-systemd/k3s-agent.service
[INFO]  systemd: Enabling k3s-agent unit
Created symlink /storage/.config/system.d/multi-user.target.wants/k3s-agent.service → /storage/k3s-systemd/k3s-agent.service.
Created symlink /storage/.config/system.d/k3s-agent.service → /storage/k3s-systemd/k3s-agent.service.
[INFO]  systemd: Starting k3s-agent

But the service does not start properly :

# journalctl -u k3s-agent
Jan 02 21:20:57 LibreELEC systemd[1]: Starting Lightweight Kubernetes...
Jan 02 21:20:57 LibreELEC sh[1625]: + /usr/bin/systemctl is-enabled --quiet nm-cloud-setup.service
Jan 02 21:20:57 LibreELEC sh[1626]: Failed to get unit file state for nm-cloud-setup.service: No such file or directory
Jan 02 21:20:57 LibreELEC systemd[1]: Started Lightweight Kubernetes.
Jan 02 21:20:57 LibreELEC k3s[1629]: time="2022-01-02T21:20:57+01:00" level=info msg="Acquiring lock file /var/lib/rancher/k3s/data/.lock"
Jan 02 21:20:57 LibreELEC k3s[1629]: time="2022-01-02T21:20:57+01:00" level=info msg="Preparing data dir /var/lib/rancher/k3s/data/2e877cf4762c3c7df37cc556de3e08890fbf450914bb3ec042ad4f36b5a241
Jan 02 21:21:02 LibreELEC k3s[1629]: time="2022-01-02T21:21:02+01:00" level=info msg="Starting k3s agent v1.22.5+k3s1 (405bf79d)"
Jan 02 21:21:02 LibreELEC k3s[1629]: time="2022-01-02T21:21:02+01:00" level=info msg="Running load balancer 127.0.0.1:6444 -> [my-control-plane:6443]"
Jan 02 21:21:02 LibreELEC k3s[1629]: time="2022-01-02T21:21:02+01:00" level=info msg="Waiting to retrieve agent configuration; server is not ready: mkdir /etc/rancher: read-only file system"
Jan 02 21:21:07 LibreELEC k3s[1629]: time="2022-01-02T21:21:07+01:00" level=info msg="Waiting to retrieve agent configuration; server is not ready: mkdir /etc/rancher: read-only file system"
Jan 02 21:21:12 LibreELEC k3s[1629]: time="2022-01-02T21:21:12+01:00" level=info msg="Waiting to retrieve agent configuration; server is not ready: mkdir /etc/rancher: read-only file system"

Describe alternatives you've considered

I've tried to run the k3s binary manually. But it fails with the same error message :

# /storage/k3s/k3s agent --token ... --server https://my-control-plane:6443 --docker
INFO[0000] Starting k3s agent v1.22.5+k3s1 (405bf79d)   
INFO[0000] Running load balancer 127.0.0.1:6444 -> [my-control-plane:6443] 
INFO[0000] Waiting to retrieve agent configuration; server is not ready: mkdir /etc/rancher: read-only file system 
INFO[0005] Waiting to retrieve agent configuration; server is not ready: mkdir /etc/rancher: read-only file system

Setting --private-registry and/or --data-dir command-line parameters does not seem to help

Additional context

I'd be happy to test anything

Output of docker info on LibreElec :

Client:
 Debug Mode: false

Server:
 Containers: 0
  Running: 0
  Paused: 0
  Stopped: 0
 Images: 4
 Server Version: 19.03.15
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Native Overlay Diff: true
 Logging Driver: journald
 Cgroup Driver: systemd
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 8fba4e9a7d01810a393d5d25a3621dc101981175
 runc version: 12644e614e25b05da6fd08a38ffa0cfe1903fdec
 init version: 
 Kernel Version: 5.10.76
 Operating System: LibreELEC (official): 10.0.1
 OSType: linux
 Architecture: x86_64
 CPUs: 4
 Total Memory: 3.676GiB
 Name: LibreELEC
 ID: LYLJ:RAPR:RYLD:TFSX:ZJRZ:XH5C:4J6W:ZM73:ESV4:E6WB:YQBT:Y75B
 Docker Root Dir: /storage/.kodi/userdata/addon_data/service.system.docker/docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

Backporting

  • Needs backporting to older releases
@brandond
Copy link
Member

brandond commented Jan 3, 2022

you can set --data-dir to change /var/lib/rancher/k3s to something else, but we don't currently have any way to relocate all of the files in /etc/rancher for non-rootless use cases. Your best bet may be to remount /etc read-write and symlink /etc/rancher to somewhere writable.

@mossroy
Copy link
Author

mossroy commented Jan 4, 2022

Thanks for the feedback.
I suspected it was not implemented, that's why I had used the "Feature request" issue template ;-)

I unfortunately can't remount / (that contains /etc) as read-write, because it's a squashfs filesystem. So I'm not able to create /etc/rancher symlink.
This squashfs seems to be their way to easily implement OS upgrades : they probably replace the underlying image

Maybe a workaround could be to modify the systemd service, so that it runs a script with unshare -m that would mount a writable copy of /etc. I mean having a copy of /etc in a writable place, and mount it on /etc (nearest existing directory), but only for this process. I did not test so it's just an idea
Mounting /etc for all processes would probably work too, but it's more invasive

@mossroy
Copy link
Author

mossroy commented Jan 5, 2022

unshare is unfortunately not available on LibreElec.
So I tested to copy /etc in a writable place (I used /tmp/etc), and mounted it on /etc. I might automate that on startup.

The k3s-agent service starts, creates a file /etc/rancher/node/password, but then fails on another error message PIDS cgroup support not found

Jan 05 21:18:57 LibreELEC systemd[1]: Starting Lightweight Kubernetes...
Jan 05 21:18:57 LibreELEC sh[1269]: + /usr/bin/systemctl is-enabled --quiet nm-cloud-setup.service
Jan 05 21:18:57 LibreELEC sh[1270]: Failed to get unit file state for nm-cloud-setup.service: No such file or directory
Jan 05 21:18:57 LibreELEC systemd[1]: Started Lightweight Kubernetes.
Jan 05 21:18:57 LibreELEC k3s[1273]: time="2022-01-05T21:18:57+01:00" level=info msg="Acquiring lock file /var/lib/rancher/k3s/data/.lock"
Jan 05 21:18:57 LibreELEC k3s[1273]: time="2022-01-05T21:18:57+01:00" level=info msg="Preparing data dir /var/lib/rancher/k3s/data/2e877cf4762c3c7df37cc554de3e08890fbe450914bb3ec04
Jan 05 21:19:01 LibreELEC k3s[1273]: time="2022-01-05T21:19:01+01:00" level=info msg="Starting k3s agent v1.22.5+k3s1 (405bf79d)"
Jan 05 21:19:01 LibreELEC k3s[1273]: time="2022-01-05T21:19:01+01:00" level=info msg="Running load balancer 127.0.0.1:6444 -> [my-control-plane:6443]"
Jan 05 21:19:02 LibreELEC k3s[1273]: time="2022-01-05T21:19:02+01:00" level=info msg="Module overlay was already loaded"
Jan 05 21:19:02 LibreELEC k3s[1273]: time="2022-01-05T21:19:02+01:00" level=info msg="Module nf_conntrack was already loaded"
Jan 05 21:19:02 LibreELEC k3s[1273]: time="2022-01-05T21:19:02+01:00" level=info msg="Module br_netfilter was already loaded"
Jan 05 21:19:02 LibreELEC k3s[1273]: time="2022-01-05T21:19:02+01:00" level=info msg="Module iptable_nat was already loaded"
Jan 05 21:19:02 LibreELEC k3s[1273]: time="2022-01-05T21:19:02+01:00" level=info msg="Set sysctl 'net/netfilter/nf_conntrack_max' to 131072"
Jan 05 21:19:02 LibreELEC k3s[1273]: time="2022-01-05T21:19:02+01:00" level=info msg="Set sysctl 'net/netfilter/nf_conntrack_tcp_timeout_established' to 86400"
Jan 05 21:19:02 LibreELEC k3s[1273]: time="2022-01-05T21:19:02+01:00" level=info msg="Set sysctl 'net/netfilter/nf_conntrack_tcp_timeout_close_wait' to 3600"
Jan 05 21:19:03 LibreELEC k3s[1273]: time="2022-01-05T21:19:03+01:00" level=info msg="Updating load balancer server addresses -> [x.x.x.x:6443 my-control-plane:6443]"
Jan 05 21:19:03 LibreELEC k3s[1273]: time="2022-01-05T21:19:03+01:00" level=info msg="Connecting to proxy" url="wss://x.x.x.x:6443/v1-k3s/connect"
Jan 05 21:19:03 LibreELEC k3s[1273]: time="2022-01-05T21:19:03+01:00" level=fatal msg="PIDS cgroup support not found"
Jan 05 21:19:03 LibreELEC systemd[1]: k3s-agent.service: Main process exited, code=exited, status=1/FAILURE
Jan 05 21:19:03 LibreELEC systemd[1]: k3s-agent.service: Failed with result 'exit-code'.
Jan 05 21:19:08 LibreELEC systemd[1]: k3s-agent.service: Scheduled restart job, restart counter is at 1.
Jan 05 21:19:08 LibreELEC systemd[1]: Stopped Lightweight Kubernetes.

@brandond
Copy link
Member

brandond commented Jan 5, 2022

That has been required by upstream Kubernetes since 1.20 if I remember correctly. Is libreelec shipping a particularly old kernel, or perhaps just doesn't have all the cgroups enabled?

@mossroy
Copy link
Author

mossroy commented Jan 5, 2022

It uses a recent kernel : 5.10.76 (more detail in the output of docker info above).
How can I check if all the cgroups are enabled in the kernel?
The kernel options seem to be https://github.com/LibreELEC/LibreELEC.tv/blob/master/distributions/LibreELEC/kernel_options : I might create a ticket there to ask for some additional kernel options.

@brandond
Copy link
Member

brandond commented Jan 5, 2022

Yeah, they need CONFIG_CGROUP_PIDS and probably a bunch of other ones as well. You might run k3s check-config and see what it says.

@mossroy
Copy link
Author

mossroy commented Jan 5, 2022

Great! Many thanks for your help.

Here is the output :

Verifying binaries in /var/lib/rancher/k3s/data/2e877cf4762c3c7df37cc554de3e08890fbe450914bb3ec042ad4f36b5a2413a/bin:
- sha256sum: good
- links: good

System:
- /usr/sbin iptables v1.8.7 (legacy): ok
- swap: disabled
- routes: ok

Limits:
- /proc/sys/kernel/keys/root_maxkeys: 1000000

info: reading kernel config from /proc/config.gz ...

Generally Necessary:
- cgroup hierarchy: cgroups Hybrid mounted, cpuset|memory controllers status: good
- CONFIG_NAMESPACES: enabled
- CONFIG_NET_NS: enabled
- CONFIG_PID_NS: enabled
- CONFIG_IPC_NS: enabled
- CONFIG_UTS_NS: enabled
- CONFIG_CGROUPS: enabled
- CONFIG_CGROUP_CPUACCT: enabled
- CONFIG_CGROUP_DEVICE: enabled
- CONFIG_CGROUP_FREEZER: enabled
- CONFIG_CGROUP_SCHED: enabled
- CONFIG_CPUSETS: enabled
- CONFIG_MEMCG: enabled
- CONFIG_KEYS: enabled
- CONFIG_VETH: enabled (as module)
- CONFIG_BRIDGE: enabled (as module)
- CONFIG_BRIDGE_NETFILTER: enabled (as module)
- CONFIG_IP_NF_FILTER: enabled (as module)
- CONFIG_IP_NF_TARGET_MASQUERADE: enabled (as module)
- CONFIG_NETFILTER_XT_MATCH_ADDRTYPE: enabled (as module)
- CONFIG_NETFILTER_XT_MATCH_CONNTRACK: enabled (as module)
- CONFIG_NETFILTER_XT_MATCH_IPVS: missing (fail)
- CONFIG_IP_NF_NAT: enabled (as module)
- CONFIG_NF_NAT: enabled (as module)
- CONFIG_POSIX_MQUEUE: enabled

Optional Features:
- CONFIG_USER_NS: missing
- CONFIG_SECCOMP: enabled
- CONFIG_CGROUP_PIDS: missing
- CONFIG_BLK_CGROUP: enabled
- CONFIG_BLK_DEV_THROTTLING: missing
- CONFIG_CGROUP_PERF: enabled
- CONFIG_CGROUP_HUGETLB: missing
- CONFIG_NET_CLS_CGROUP: missing
- CONFIG_CGROUP_NET_PRIO: missing
- CONFIG_CFS_BANDWIDTH: enabled
- CONFIG_FAIR_GROUP_SCHED: enabled
- CONFIG_RT_GROUP_SCHED: enabled
- CONFIG_IP_NF_TARGET_REDIRECT: missing
- CONFIG_IP_SET: missing
- CONFIG_IP_VS: missing
- CONFIG_IP_VS_NFCT: missing
- CONFIG_IP_VS_PROTO_TCP: missing
- CONFIG_IP_VS_PROTO_UDP: missing
- CONFIG_IP_VS_RR: missing
- CONFIG_EXT4_FS: enabled
- CONFIG_EXT4_FS_POSIX_ACL: enabled
- CONFIG_EXT4_FS_SECURITY: enabled
- Network Drivers:
  - "overlay":
    - CONFIG_VXLAN: missing
      Optional (for encrypted networks):
      - CONFIG_CRYPTO: enabled
      - CONFIG_CRYPTO_AEAD: enabled
      - CONFIG_CRYPTO_GCM: enabled
      - CONFIG_CRYPTO_SEQIV: enabled
      - CONFIG_CRYPTO_GHASH: enabled
      - CONFIG_XFRM: enabled
      - CONFIG_XFRM_USER: missing
      - CONFIG_XFRM_ALGO: missing
      - CONFIG_INET_ESP: missing
      - CONFIG_INET_XFRM_MODE_TRANSPORT: missing
- Storage Drivers:
  - "overlay":
    - CONFIG_OVERLAY_FS: enabled (as module)

STATUS: 1 (fail)

Only CONFIG_NETFILTER_XT_MATCH_IPVS is displayed in red.
CONFIG_CGROUP_PIDS is missing, but is marked as an optional feature : is that normal?
Should I also ask for the other listed CONFIG_CGROUP_* options? For some other ones?

@brandond
Copy link
Member

brandond commented Jan 5, 2022

The check-config script hasn't been updated in a bit. It is now required.

@mossroy
Copy link
Author

mossroy commented Jan 6, 2022

All right. So which config options should I ask for?

  • CONFIG_NETFILTER_XT_MATCH_IPVS
  • CONFIG_CGROUP_PIDS
  • some other ones?

@brandond
Copy link
Member

brandond commented Jan 6, 2022

I'm not 100% sure but I suspect these should do it:

  • CONFIG_USER_NS
  • CONFIG_CGROUP_PIDS
  • CONFIG_IP_SET

@mossroy
Copy link
Author

mossroy commented Jan 7, 2022

I've made some progress with your help.

I managed to recompile LibreELEC with these kernel options (except CONFIG_NETFILTER_XT_MATCH_IPVS, that did not seem to exist), and it solved the "PIDS cgroup support not found" issue.
It took me some time because there was a trap : https://github.com/LibreELEC/LibreELEC.tv/blob/master/distributions/LibreELEC/kernel_options was not the right place to set the kernel options. I had to set them in https://github.com/LibreELEC/LibreELEC.tv/blob/master/projects/Generic/linux/linux.x86_64.conf

Anyway, after doing so (in their libreelec-10.0 branch), I managed to run it in Virtualbox, install the Docker addon, and install k3s with the following commands :

cp -ar /etc /tmp/
mount /tmp/etc /etc
mkdir /etc/rancher
mkdir /storage/k3s /storage/k3s-systemd /storage/k3s-config
mount /storage/k3s-config /etc/rancher
curl -sfL https://get.k3s.io | INSTALL_K3S_SYSTEMD_DIR=/storage/k3s-systemd INSTALL_K3S_BIN_DIR=/storage/k3s K3S_URL=https://my-control-plane:6443 K3S_TOKEN=... sh -s - --docker

I also had to modify the systemd file /storage/.kodi/addons/service.system.docker/system.d/service.system.docker.service , to make it use cgroupfs driver instead of systemd :

ExecStart=/storage/.kodi/addons/service.system.docker/bin/dockerd --exec-opt native.cgroupdriver=cgroupfs \

(followed by systemctl daemon-reload and systemctl restart docker k3s-agent)

This way, k3s-agent manages to register the node on the control-plane. Great!

... but it's not over : the service restarts in loop with the following warnings and errors :

(...)
Jan 07 21:48:16 LibreELECvbox2 k3s[39561]: I0107 21:48:16.065137   39561 proxier.go:659] "Failed to load kernel module with modprobe. You can ignore this message when kube-proxy is running inside container without mounting /lib/modules" moduleName="ip_vs"
Jan 07 21:48:16 LibreELECvbox2 k3s[39561]: I0107 21:48:16.065868   39561 proxier.go:659] "Failed to load kernel module with modprobe. You can ignore this message when kube-proxy is running inside container without mounting /lib/modules" moduleName="ip_vs_rr"
Jan 07 21:48:16 LibreELECvbox2 k3s[39561]: I0107 21:48:16.066437   39561 proxier.go:659] "Failed to load kernel module with modprobe. You can ignore this message when kube-proxy is running inside container without mounting /lib/modules" moduleName="ip_vs_wrr"
Jan 07 21:48:16 LibreELECvbox2 k3s[39561]: I0107 21:48:16.068485   39561 proxier.go:659] "Failed to load kernel module with modprobe. You can ignore this message when kube-proxy is running inside container without mounting /lib/modules" moduleName="ip_vs_sh"
(...)
Jan 07 21:48:16 LibreELECvbox2 k3s[39561]: I0107 21:48:16.124667   39561 kuberuntime_manager.go:245] "Container runtime initialized" containerRuntime="docker" version="19.03.15" apiVersion="1.40.0"
Jan 07 21:48:16 LibreELECvbox2 k3s[39561]: W0107 21:48:16.124809   39561 probe.go:268] Flexvolume plugin directory at /usr/libexec/kubernetes/kubelet-plugins/volume/exec/ does not exist. Recreating.
Jan 07 21:48:16 LibreELECvbox2 k3s[39561]: E0107 21:48:16.124868   39561 plugins.go:611] "Error initializing dynamic plugin prober" err="error (re-)creating driver directory: mkdir /usr/libexec: read-only file system"
Jan 07 21:48:16 LibreELECvbox2 k3s[39561]: I0107 21:48:16.124990   39561 server.go:1213] "Started kubelet"
Jan 07 21:48:16 LibreELECvbox2 k3s[39561]: E0107 21:48:16.125480   39561 kubelet.go:1343] "Image garbage collection failed once. Stats initialization may not have completed yet" err="failed to get imageFs info: unable to find data in memory cache"
(...)
Jan 07 21:48:16 LibreELECvbox2 k3s[39561]: E0107 21:48:16.132362   39561 kubelet_network_linux.go:83] "Failed to ensure marking rule for KUBE-MARK-DROP chain" err="error checking rule: exit status 2: iptables v1.8.7 (legacy): unknown option \"--or-mark\"\nTry `iptables -h' or 'iptables --help' for more information.\n"
(...)
Jan 07 21:48:16 LibreELECvbox2 k3s[39561]: E0107 21:48:16.134742   39561 kubelet_network_linux.go:83] "Failed to ensure marking rule for KUBE-MARK-DROP chain" err="error checking rule: exit status 2: ip6tables v1.8.7 (legacy): unknown option \"--or-mark\"\nTry `ip6tables -h' or 'ip6tables --help' for more information.\n"
(...)
Jan 07 21:48:16 LibreELECvbox2 k3s[39561]: E0107 21:48:16.134903   39561 kubelet.go:1991] "Skipping pod synchronization" err="[container runtime status check may not have completed yet, PLEG is not healthy: pleg has yet to be successful]"
(...)
Jan 07 21:48:16 LibreELECvbox2 k3s[39561]: E0107 21:48:16.316517   39561 proxier.go:867] "Failed to ensure chain jumps" err="error checking rule: exit status 2: iptables v1.8.7 (legacy): Couldn't load match `comment':No such file or directory\n\nTry `iptables -h' or 'iptables --help' for more information.\n" table=filter srcChain=INPUT dstChain=KUBE-EXTERNAL-SERVICES
Jan 07 21:48:16 LibreELECvbox2 k3s[39561]: I0107 21:48:16.316579   39561 proxier.go:851] "Sync failed" retryingTime="30s"
Jan 07 21:48:16 LibreELECvbox2 k3s[39561]: E0107 21:48:16.318416   39561 proxier.go:867] "Failed to ensure chain jumps" err="error checking rule: exit status 2: ip6tables v1.8.7 (legacy): Couldn't load match `comment':No such file or directory\n\nTry `ip6tables -h' or 'ip6tables --help' for more information.\n" table=filter srcChain=INPUT dstChain=KUBE-EXTERNAL-SERVICES
Jan 07 21:48:16 LibreELECvbox2 k3s[39561]: I0107 21:48:16.318465   39561 proxier.go:851] "Sync failed" retryingTime="30s"
(...)
Jan 07 21:48:16 LibreELECvbox2 k3s[39561]: F0107 21:48:16.329655   39561 network_policy_controller.go:290] Failed to verify rule exists in INPUT chain due to running [/usr/sbin/iptables -t filter -C INPUT -m comment --comment kube-router netpol - 4IA2OSFRMVNDXBVV -j KUBE-ROUTER-INPUT --wait]: exit status 2: iptables v1.8.7 (legacy): Couldn't load match `comment':No such file or directory
Jan 07 21:48:16 LibreELECvbox2 k3s[39561]: Try `iptables -h' or 'iptables --help' for more information.
Jan 07 21:48:16 LibreELECvbox2 k3s[39561]: panic: F0107 21:48:16.329655   39561 network_policy_controller.go:290] Failed to verify rule exists in INPUT chain due to running [/usr/sbin/iptables -t filter -C INPUT -m comment --comment kube-router netpol - 4IA2OSFRMVNDXBVV -j KUBE-ROUTER-INPUT --wait]: exit status 2: iptables v1.8.7 (legacy): Couldn't load match `comment':No such file or directory
Jan 07 21:48:16 LibreELECvbox2 k3s[39561]: Try `iptables -h' or 'iptables --help' for more information.
Jan 07 21:48:16 LibreELECvbox2 k3s[39561]: goroutine 1757 [running]:
Jan 07 21:48:16 LibreELECvbox2 k3s[39561]: github.com/rancher/k3s/vendor/k8s.io/klog/v2.(*loggingT).output(0x8184640, 0xc000000003, 0x0, 0x0, 0xc0006dd960, 0x0, 0x69fb14e, 0x1c, 0x122, 0x0)
Jan 07 21:48:16 LibreELECvbox2 k3s[39561]:         /go/src/github.com/rancher/k3s/vendor/k8s.io/klog/v2/klog.go:970 +0x805
Jan 07 21:48:16 LibreELECvbox2 k3s[39561]: github.com/rancher/k3s/vendor/k8s.io/klog/v2.(*loggingT).printf(0x8184640, 0xc000000003, 0x0, 0x0, 0x0, 0x0, 0x501daff, 0x32, 0xc0029e5300, 0x2, ...)
Jan 07 21:48:16 LibreELECvbox2 k3s[39561]:         /go/src/github.com/rancher/k3s/vendor/k8s.io/klog/v2/klog.go:753 +0x19a
Jan 07 21:48:16 LibreELECvbox2 k3s[39561]: github.com/rancher/k3s/vendor/k8s.io/klog/v2.Fatalf(...)
Jan 07 21:48:16 LibreELECvbox2 k3s[39561]:         /go/src/github.com/rancher/k3s/vendor/k8s.io/klog/v2/klog.go:1495
Jan 07 21:48:16 LibreELECvbox2 k3s[39561]: github.com/rancher/k3s/pkg/agent/netpol.(*NetworkPolicyController).ensureTopLevelChains.func2(0x4f1b85f, 0x5, 0xc001cd39b0, 0x6, 0x6, 0xc002a005c0, 0x10, 0x1)
Jan 07 21:48:16 LibreELECvbox2 k3s[39561]:         /go/src/github.com/rancher/k3s/pkg/agent/netpol/network_policy_controller.go:290 +0xd06
Jan 07 21:48:16 LibreELECvbox2 k3s[39561]: github.com/rancher/k3s/pkg/agent/netpol.(*NetworkPolicyController).ensureTopLevelChains(0xc000ab1b80)
Jan 07 21:48:16 LibreELECvbox2 k3s[39561]:         /go/src/github.com/rancher/k3s/pkg/agent/netpol/network_policy_controller.go:343 +0x2ac
Jan 07 21:48:16 LibreELECvbox2 k3s[39561]: github.com/rancher/k3s/pkg/agent/netpol.(*NetworkPolicyController).Run(0xc000ab1b80, 0xc0005ee7e0)
Jan 07 21:48:16 LibreELECvbox2 k3s[39561]:         /go/src/github.com/rancher/k3s/pkg/agent/netpol/network_policy_controller.go:147 +0x125
Jan 07 21:48:16 LibreELECvbox2 k3s[39561]: created by github.com/rancher/k3s/pkg/agent/netpol.Run
Jan 07 21:48:16 LibreELECvbox2 k3s[39561]:         /go/src/github.com/rancher/k3s/pkg/agent/netpol/netpol.go:64 +0x5cd
Jan 07 21:48:16 LibreELECvbox2 systemd[1]: k3s-agent.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
Jan 07 21:48:16 LibreELECvbox2 systemd[1]: k3s-agent.service: Failed with result 'exit-code'.
Jan 07 21:48:21 LibreELECvbox2 systemd[1]: k3s-agent.service: Scheduled restart job, restart counter is at 189.
Jan 07 21:48:21 LibreELECvbox2 systemd[1]: Stopped Lightweight Kubernetes.

I might do a similar trick to make /usr writable (like for /etc) if necessary, at least to go further.
After some search, the iptables error message might come from some other missing kernel options : one of the CONFIG_NETFILTER_XT_*MARK ?

@brandond
Copy link
Member

brandond commented Jan 8, 2022

Why are you running it with docker instead of using containerd?

You might also try removing the host iptables packages, unless you have something else that needs them. K3s ships with its own fallback iptables binary that has all the required features enabled, but it is only used if the host does not already have iptables installed. That assumes the root cause is not a missing kernel module.

@mossroy
Copy link
Author

mossroy commented Jan 8, 2022

Why are you running it with docker instead of using containerd?

Because I'm sure docker runs properly on it, as there is an official addon.
Also by habit, I have to confess...

AFAIK, there is no package manager to install/uninstall things : all necessary binaries seem to be compiled from scratch at build time. So it should be complicated to install containerd

You might also try removing the host iptables packages, unless you have something else that needs them. K3s ships with its own fallback iptables binary that has all the required features enabled, but it is only used if the host does not already have iptables installed. That assumes the root cause is not a missing kernel module.

I can try to disable it, by masking the systemd service (see https://github.com/LibreELEC/LibreELEC.tv/blob/master/packages/network/iptables/config/README). I just hope it's not necessary for something else in LibreELEC : they probably didn't compile and add it for no reason, but (if I'm lucky) it might be there "only" for docker.

@mossroy
Copy link
Author

mossroy commented Jan 8, 2022

After mounting /usr in a writable place, and removing all iptables or ip6tables binaries from /usr/sbin, it seems to use the iptables bundled with k3s.
But some calls to this iptables command-line fail with Couldn't find match 'comment' :

Jan 08 09:00:36 LibreELECvbox2 systemd[1]: Starting Lightweight Kubernetes...
Jan 08 09:00:36 LibreELECvbox2 sh[14725]: + /usr/bin/systemctl is-enabled --quiet nm-cloud-setup.service
Jan 08 09:00:36 LibreELECvbox2 sh[14726]: Failed to get unit file state for nm-cloud-setup.service: No such file or directory
Jan 08 09:00:36 LibreELECvbox2 systemd[1]: Started Lightweight Kubernetes.
Jan 08 09:00:36 LibreELECvbox2 k3s[14729]: time="2022-01-08T09:00:36Z" level=info msg="Starting k3s agent v1.22.5+k3s1 (405bf79d)"
Jan 08 09:00:36 LibreELECvbox2 k3s[14729]: time="2022-01-08T09:00:36Z" level=info msg="Running load balancer 127.0.0.1:6444 -> [x.x.x.x:6443 my-control-plane:6443]"
Jan 08 09:00:38 LibreELECvbox2 k3s[14729]: time="2022-01-08T09:00:38Z" level=info msg="Module overlay was already loaded"
Jan 08 09:00:38 LibreELECvbox2 k3s[14729]: time="2022-01-08T09:00:38Z" level=info msg="Module nf_conntrack was already loaded"
Jan 08 09:00:38 LibreELECvbox2 k3s[14729]: time="2022-01-08T09:00:38Z" level=info msg="Module br_netfilter was already loaded"
Jan 08 09:00:38 LibreELECvbox2 k3s[14729]: time="2022-01-08T09:00:38Z" level=info msg="Module iptable_nat was already loaded"
Jan 08 09:00:38 LibreELECvbox2 k3s[14729]: time="2022-01-08T09:00:38Z" level=info msg="Connecting to proxy" url="wss://x.x.x.x:6443/v1-k3s/connect"
Jan 08 09:00:38 LibreELECvbox2 k3s[14729]: time="2022-01-08T09:00:38Z" level=info msg="Running kubelet --address=0.0.0.0 --anonymous-auth=false --authentication-token-webhook=true --authorization-mode=Webhook --cgroup-driver=cgroupfs --client-ca-file=/var/lib/rancher/k3s/agent/client-ca.crt --cloud-provider=external --cluster-dns=10.43.0.10 --cluster-domain=cluster.local --cni-bin-dir=/var/lib/rancher/k3s/data/2e877cf4762c3c7df37cc556de3e08890fbf450914bb3ec042ad4f3
Jan 08 09:00:38 LibreELECvbox2 k3s[14729]: Flag --cloud-provider has been deprecated, will be removed in 1.23, in favor of removing cloud provider code from Kubelet.
Jan 08 09:00:38 LibreELECvbox2 k3s[14729]: Flag --cni-bin-dir has been deprecated, will be removed along with dockershim.
Jan 08 09:00:38 LibreELECvbox2 k3s[14729]: Flag --cni-conf-dir has been deprecated, will be removed along with dockershim.
Jan 08 09:00:38 LibreELECvbox2 k3s[14729]: Flag --network-plugin has been deprecated, will be removed along with dockershim.
Jan 08 09:00:38 LibreELECvbox2 k3s[14729]: I0108 09:00:38.351268   14729 server.go:436] "Kubelet version" kubeletVersion="v1.22.5+k3s1"
Jan 08 09:00:38 LibreELECvbox2 k3s[14729]: I0108 09:00:38.364758   14729 dynamic_cafile_content.go:155] "Starting controller" name="client-ca-bundle::/var/lib/rancher/k3s/agent/client-ca.crt"
Jan 08 09:00:38 LibreELECvbox2 k3s[14729]: time="2022-01-08T09:00:38Z" level=info msg="labels have already set on node: libreelecvbox2"
Jan 08 09:00:38 LibreELECvbox2 k3s[14729]: time="2022-01-08T09:00:38Z" level=info msg="Starting flannel with backend vxlan"
Jan 08 09:00:38 LibreELECvbox2 k3s[14729]: time="2022-01-08T09:00:38Z" level=info msg="Running kube-proxy --cluster-cidr=10.42.0.0/16 --conntrack-max-per-core=0 --conntrack-tcp-timeout-close-wait=0s --conntrack-tcp-timeout-established=0s --healthz-bind-address=127.0.0.1 --hostname-override=libreelecvbox2 --kubeconfig=/var/lib/rancher/k3s/agent/kubeproxy.kubeconfig --proxy-mode=iptables"
Jan 08 09:00:38 LibreELECvbox2 k3s[14729]: W0108 09:00:38.407526   14729 server.go:224] WARNING: all flags other than --config, --write-config-to, and --cleanup are deprecated. Please begin using a config file ASAP.
Jan 08 09:00:38 LibreELECvbox2 k3s[14729]: I0108 09:00:38.408486   14729 proxier.go:659] "Failed to load kernel module with modprobe. You can ignore this message when kube-proxy is running inside container without mounting /lib/modules" moduleName="ip_vs"
Jan 08 09:00:38 LibreELECvbox2 k3s[14729]: I0108 09:00:38.409185   14729 proxier.go:659] "Failed to load kernel module with modprobe. You can ignore this message when kube-proxy is running inside container without mounting /lib/modules" moduleName="ip_vs_rr"
Jan 08 09:00:38 LibreELECvbox2 k3s[14729]: I0108 09:00:38.409768   14729 proxier.go:659] "Failed to load kernel module with modprobe. You can ignore this message when kube-proxy is running inside container without mounting /lib/modules" moduleName="ip_vs_wrr"
Jan 08 09:00:38 LibreELECvbox2 k3s[14729]: I0108 09:00:38.410345   14729 proxier.go:659] "Failed to load kernel module with modprobe. You can ignore this message when kube-proxy is running inside container without mounting /lib/modules" moduleName="ip_vs_sh"
Jan 08 09:00:38 LibreELECvbox2 k3s[14729]: time="2022-01-08T09:00:38Z" level=info msg="Flannel found PodCIDR assigned for node libreelecvbox2"
Jan 08 09:00:38 LibreELECvbox2 k3s[14729]: I0108 09:00:38.415006   14729 flannel.go:93] Determining IP address of default interface
Jan 08 09:00:38 LibreELECvbox2 k3s[14729]: I0108 09:00:38.416486   14729 kube.go:120] Waiting 10m0s for node controller to sync
Jan 08 09:00:38 LibreELECvbox2 k3s[14729]: I0108 09:00:38.419958   14729 kube.go:378] Starting kube subnet manager
Jan 08 09:00:38 LibreELECvbox2 k3s[14729]: I0108 09:00:38.451987   14729 node.go:172] Successfully retrieved node IP: x.x.x.x
Jan 08 09:00:38 LibreELECvbox2 k3s[14729]: I0108 09:00:38.452268   14729 server_others.go:140] Detected node IP x.x.x.x
Jan 08 09:00:38 LibreELECvbox2 k3s[14729]: I0108 09:00:38.457527   14729 server_others.go:206] kube-proxy running in dual-stack mode, IPv4-primary
Jan 08 09:00:38 LibreELECvbox2 k3s[14729]: I0108 09:00:38.457689   14729 server_others.go:212] Using iptables Proxier.
Jan 08 09:00:38 LibreELECvbox2 k3s[14729]: I0108 09:00:38.457789   14729 server_others.go:219] creating dualStackProxier for iptables.

Jan 08 09:00:38 LibreELECvbox2 k3s[14729]: W0108 09:00:38.457878   14729 server_others.go:495] detect-local-mode set to ClusterCIDR, but no IPv6 cluster CIDR defined, , defaulting to no-op detect-local for IPv6
Jan 08 09:00:38 LibreELECvbox2 k3s[14729]: I0108 09:00:38.458407   14729 server.go:649] Version: v1.22.5+k3s1
Jan 08 09:00:38 LibreELECvbox2 k3s[14729]: I0108 09:00:38.461494   14729 config.go:315] Starting service config controller
Jan 08 09:00:38 LibreELECvbox2 k3s[14729]: I0108 09:00:38.461605   14729 shared_informer.go:240] Waiting for caches to sync for service config
Jan 08 09:00:38 LibreELECvbox2 k3s[14729]: I0108 09:00:38.461690   14729 config.go:224] Starting endpoint slice config controller
Jan 08 09:00:38 LibreELECvbox2 k3s[14729]: I0108 09:00:38.461750   14729 shared_informer.go:240] Waiting for caches to sync for endpoint slice config
Jan 08 09:00:38 LibreELECvbox2 k3s[14729]: I0108 09:00:38.557458   14729 network_policy_controller.go:144] Starting network policy controller
Jan 08 09:00:38 LibreELECvbox2 k3s[14729]: I0108 09:00:38.564853   14729 shared_informer.go:247] Caches are synced for endpoint slice config
Jan 08 09:00:38 LibreELECvbox2 k3s[14729]: F0108 09:00:38.565353   14729 network_policy_controller.go:290] Failed to verify rule exists in INPUT chain due to running [/var/lib/rancher/k3s/data/2e877cf4762c3c7df37cc556de3e08890fbf450914bb3ec042ad4f36b5a2413a/bin/aux/iptables -t filter -C INPUT -m comment --comment kube-router netpol - 4IA2OSFRMVNDXBVV -j KUBE-ROUTER-INPUT --wait]: exit status 2: iptables v1.8.6 (legacy): Couldn't find match `comment'
Jan 08 09:00:38 LibreELECvbox2 k3s[14729]: Try `iptables -h' or 'iptables --help' for more information.
Jan 08 09:00:38 LibreELECvbox2 k3s[14729]: panic: F0108 09:00:38.565353   14729 network_policy_controller.go:290] Failed to verify rule exists in INPUT chain due to running [/var/lib/rancher/k3s/data/2e877cf4762c3c7df37cc556de3e08890fbf450914bb3ec042ad4f36b5a2413a/bin/aux/iptables -t filter -C INPUT -m comment --comment kube-router netpol - 4IA2OSFRMVNDXBVV -j KUBE-ROUTER-INPUT --wait]: exit status 2: iptables v1.8.6 (legacy): Couldn't find match `comment'
Jan 08 09:00:38 LibreELECvbox2 k3s[14729]: Try `iptables -h' or 'iptables --help' for more information.
Jan 08 09:00:38 LibreELECvbox2 k3s[14729]: goroutine 1245 [running]:
Jan 08 09:00:38 LibreELECvbox2 k3s[14729]: github.com/rancher/k3s/vendor/k8s.io/klog/v2.(*loggingT).output(0x8184640, 0xc000000003, 0x0, 0x0, 0xc0006e9f80, 0x0, 0x69fb14e, 0x1c, 0x122, 0x0)
Jan 08 09:00:38 LibreELECvbox2 k3s[14729]:         /go/src/github.com/rancher/k3s/vendor/k8s.io/klog/v2/klog.go:970 +0x805
Jan 08 09:00:38 LibreELECvbox2 k3s[14729]: github.com/rancher/k3s/vendor/k8s.io/klog/v2.(*loggingT).printf(0x8184640, 0xc000000003, 0x0, 0x0, 0x0, 0x0, 0x501daff, 0x32, 0xc001bf2fe0, 0x2, ...)
Jan 08 09:00:38 LibreELECvbox2 k3s[14729]:         /go/src/github.com/rancher/k3s/vendor/k8s.io/klog/v2/klog.go:753 +0x19a
Jan 08 09:00:38 LibreELECvbox2 k3s[14729]: github.com/rancher/k3s/vendor/k8s.io/klog/v2.Fatalf(...)
Jan 08 09:00:38 LibreELECvbox2 k3s[14729]:         /go/src/github.com/rancher/k3s/vendor/k8s.io/klog/v2/klog.go:1495
Jan 08 09:00:38 LibreELECvbox2 k3s[14729]: github.com/rancher/k3s/pkg/agent/netpol.(*NetworkPolicyController).ensureTopLevelChains.func2(0x4f1b85f, 0x5, 0xc001f1d9b0, 0x6, 0x6, 0xc000cf4e80, 0x10, 0x1)
Jan 08 09:00:38 LibreELECvbox2 k3s[14729]:         /go/src/github.com/rancher/k3s/pkg/agent/netpol/network_policy_controller.go:290 +0xd06
Jan 08 09:00:38 LibreELECvbox2 k3s[14729]: github.com/rancher/k3s/pkg/agent/netpol.(*NetworkPolicyController).ensureTopLevelChains(0xc001277b80)
Jan 08 09:00:38 LibreELECvbox2 k3s[14729]:         /go/src/github.com/rancher/k3s/pkg/agent/netpol/network_policy_controller.go:343 +0x2ac
Jan 08 09:00:38 LibreELECvbox2 k3s[14729]: github.com/rancher/k3s/pkg/agent/netpol.(*NetworkPolicyController).Run(0xc001277b80, 0xc0016f1a40)
Jan 08 09:00:38 LibreELECvbox2 k3s[14729]:         /go/src/github.com/rancher/k3s/pkg/agent/netpol/network_policy_controller.go:147 +0x125
Jan 08 09:00:38 LibreELECvbox2 k3s[14729]: created by github.com/rancher/k3s/pkg/agent/netpol.Run
Jan 08 09:00:38 LibreELECvbox2 k3s[14729]:         /go/src/github.com/rancher/k3s/pkg/agent/netpol/netpol.go:64 +0x5cd
Jan 08 09:00:38 LibreELECvbox2 systemd[1]: k3s-agent.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
Jan 08 09:00:38 LibreELECvbox2 systemd[1]: k3s-agent.service: Failed with result 'exit-code'.
Jan 08 09:00:43 LibreELECvbox2 systemd[1]: k3s-agent.service: Scheduled restart job, restart counter is at 206.
Jan 08 09:00:43 LibreELECvbox2 systemd[1]: Stopped Lightweight Kubernetes.

It might be some missing kernel options, based on garywill/linux-router#18 and/or kubernetes-sigs/kind#1461

@mossroy
Copy link
Author

mossroy commented Jan 8, 2022

I added the kernel options CONFIG_NETFILTER_XT_MATCH_COMMENT and CONFIG_NETFILTER_XT_MATCH_STATISTIC

But the Docker service fails to start if it does not find iptables in the path :

Jan 08 09:49:24 LibreELECvbox2 dockerd[1385]: time="2022-01-08T09:49:24.814583605Z" level=info msg="Loading containers: start."
Jan 08 09:49:24 LibreELECvbox2 dockerd[1385]: time="2022-01-08T09:49:24.814685441Z" level=warning msg="Failed to find iptables: exec: \"iptables\": executable file not found in $PATH"
Jan 08 09:49:24 LibreELECvbox2 dockerd[1385]: time="2022-01-08T09:49:24.817325228Z" level=info msg="stopping event stream following graceful shutdown" error="context canceled" module=libcontainerd namespace=moby
Jan 08 09:49:24 LibreELECvbox2 dockerd[1385]: time="2022-01-08T09:49:24.817378311Z" level=info msg="stopping healthcheck following graceful shutdown" module=libcontainerd
Jan 08 09:49:24 LibreELECvbox2 dockerd[1385]: time="2022-01-08T09:49:24.817506900Z" level=info msg="stopping event stream following graceful shutdown" error="context canceled" module=libcontainerd namespace=plugi
Jan 08 09:49:25 LibreELECvbox2 dockerd[1385]: failed to start daemon: Error initializing network controller: error obtaining controller instance: failed to create NAT chain DOCKER: Iptables not found
Jan 08 09:49:25 LibreELECvbox2 systemd[1]: service.system.docker.service: Main process exited, code=exited, status=1/FAILURE
Jan 08 09:49:25 LibreELECvbox2 systemd[1]: service.system.docker.service: Failed with result 'exit-code'.

I'll try to use the LibreELEC iptables binary, by adding some other kernel options among CONFIG_NETFILTER_XT_*MARK

@mossroy
Copy link
Author

mossroy commented Jan 8, 2022

After many attempts, I finally made it work.
It's still in a quick-and-dirty way, just to have a Proof Of Concept.
But the k3s-agent service is running, the node is considered ready by the control-plane, which manages to deploy some pods on it.

I had to add many kernel options :

  • CONFIG_CGROUP_PIDS
  • CONFIG_USER_NS
  • CONFIG_NETFILTER_XT_MARK
  • CONFIG_NETFILTER_XT_CONNMARK
  • CONFIG_NETFILTER_XT_TARGET_CONNMARK
  • CONFIG_NETFILTER_XT_TARGET_MARK
  • CONFIG_NETFILTER_XT_TARGET_NFLOG
  • CONFIG_NETFILTER_XT_MATCH_COMMENT
  • CONFIG_NETFILTER_XT_MATCH_LIMIT
  • CONFIG_NETFILTER_XT_MATCH_MARK
  • CONFIG_NETFILTER_XT_MATCH_MULTIPORT
  • CONFIG_NETFILTER_XT_MATCH_PHYSDEV
  • CONFIG_NETFILTER_XT_MATCH_STATISTIC
  • CONFIG_IP_SET
  • CONFIG_IP_VS
  • CONFIG_IP_VS_RR
  • CONFIG_IP_VS_WRR
  • CONFIG_IP_VS_SH
  • CONFIG_BRIDGE_VLAN_FILTERING
  • CONFIG_VXLAN

Here is my current procedure to run it :

  • compile LibreELEC with the needed kernel options
  • install it (because we need a persistent storage)
  • on first run, activate SSH through the UI
  • install the "Docker" addon through the UI (it's inside "Service" category)
  • SSH into the machine, and run the following commands :
mkdir /storage/libreELEC-fake-readwrite-root-fs/
cp -ar /etc /storage/libreELEC-fake-readwrite-root-fs/
mount /storage/libreELEC-fake-readwrite-root-fs/etc /etc
mkdir /etc/rancher
mkdir /storage/k3s /storage/k3s-systemd /storage/k3s-config /storage/k3s-libexec
mount /storage/k3s-config /etc/rancher
cp -ar /usr /storage/libreELEC-fake-readwrite-root-fs/
mount /storage/libreELEC-fake-readwrite-root-fs/usr /usr
mkdir /usr/libexec
mount /storage/k3s-libexec /usr/libexec
sed -i "s/native\.cgroupdriver=systemd/native.cgroupdriver=cgroupfs/" /storage/.kodi/addons/service.system.docker/system.d/service.system.docker.service
systemctl daemon-reload
systemctl restart docker.service
curl -sfL https://get.k3s.io | INSTALL_K3S_SYSTEMD_DIR=/storage/k3s-systemd INSTALL_K3S_BIN_DIR=/storage/k3s K3S_URL=https://my-control-plane:6443 K3S_TOKEN=... sh -s - --docker

@mossroy
Copy link
Author

mossroy commented Jan 8, 2022

I'll create a ticket/PR on LibreELEC side to ask to add the kernel options.

I'll also try to make my command-lines run on each startup, to (hopefully) make it starts correctly after a reboot.
It might be necessary to set --data-dir too (but I did not find the corresponding installation environment variable in https://rancher.com/docs/k3s/latest/en/installation/install-options/#options-for-installation-with-script)

Of course, this is still very dirty.
Ideally, it would be possible to tell k3s to use different directories for /etc/rancher and /usr/libexec (just like it's already possible for some other directories)

@mossroy
Copy link
Author

mossroy commented Jan 8, 2022

Adding the following lines to /storage/.kodi/addons/service.system.docker/bin/docker-config is enough to allow k3s to start after a reboot (because it's called as ExecStartPre in the systemd service) :

mount /storage/libreELEC-fake-readwrite-root-fs/etc /etc
mount /storage/k3s-config /etc/rancher
mount /storage/libreELEC-fake-readwrite-root-fs/usr /usr
mount /storage/k3s-libexec /usr/libexec

NB : I can't use /etc/fstab, as this file is in the read-only filesystem at startup time.

This is still quick-and-dirty :

  • changing files from the addon like this will not survive an upgrade of the addon
  • if LibreELEC is upgraded itself, files from /storage/libreELEC-fake-readwrite-root-fs/ will be out-of-sync with the content of the squashfs
  • it would be better to have a persistent /var/lib/rancher, but the k3s agent seems able to recreate what is necessary

It's enough for me to test in real conditions.
But there must be a better and safer way to do that : I would probably need help from LibreELEC contributors.

@brandond
Copy link
Member

brandond commented Jan 8, 2022

K3s has containerd built in; you don't need to do anything extra to use it. Also, we're probably not going to package cri-dockerd when dockershim support is dropped in 1.24, so the --docker flag will go away. See if you can get it working without Docker; I imagine it'll be much easier.

@mossroy
Copy link
Author

mossroy commented Jan 9, 2022

Thanks @brandond . I switched to containerd without issue.
It's true that it avoids having to install and configure the Docker addon.
But I still need the kernel options (I suppose, I did not check), and I still need the dirty workarounds for /etc/ and /usr/

@mossroy
Copy link
Author

mossroy commented Jan 15, 2022

OK I manage to have a stable k3s agent (using containerd).
I've created a script that copies /etc and /usr in /storage/libreELEC-fake-readwrite-root-fs, mounts them, then creates symbolic links for /etc/rancher, /usr/libexec and /var/lib/rancher, that point to other persistent directories of /storage/
I configured this script as ExecStartPre in k3s-agent.service.
I have also set KillMode=Mixed in this same file (from #2400), so that shutdowns do not take forever (but that's optional and might have side effects)

I'm not very proud of these dirty hacks, but I'm quite happy to have achieved to make it work.

I now have a different issue, though, that is probably not related to k3s : I'm using https://github.com/kubernetes-sigs/nfs-subdir-external-provisioner to provision PVs on an external NFS server, and this is not always working. The pods seem to be able to read on NFS shares with no problem, but they fail when trying to lock a file (before writing on it). The dmesg of LibreElec repeatedly says :

lockd: cannot monitor <IP of my NFS server>

I suppose it's an issue of the NFS client of LibreELEC : I'll have to investigate

@mossroy
Copy link
Author

mossroy commented Jan 16, 2022

I found a workaround for my NFS issues.
I configured the provisioner to use a more recent version of NFS (4.1 instead of 3) :

nfs:
  mountOptions:
  - nfsvers=4.1

and re-created the PVCs.

Now the NFS is stable, and my LibreELEC device is working great as a k3s agent.

@freefly42
Copy link

@mossroy I'm thinking about following your path as a guide, but wondering why you remounted all of /etc instead of just creating a /etc/rancher symlink on start or bind mounting just /etc/rancher? are there other files in /etc that rancher has to touch?

@mossroy
Copy link
Author

mossroy commented May 29, 2022

There's no other directory that needs to be modified in /etc, except /etc/rancher.
But I did not find a way to create an /etc/rancher symlink, as /etc is in a readonly filesystem. So I ended up mounting on the closest existing directory (/etc in this case).
Maybe there's a better way, don't hesitate to tell me if you find one.

BTW I've written a blog post (in French) that might be easier to follow, as it goes straight to the solution I'm currently using (with containerd instead of docker): https://blog.mossroy.fr/2022/04/25/mediacenter-diy-saison-2-avec-libreelec-et-k3s/

@cwayne18
Copy link
Member

It's not on the roadmap to officially support this, but it seems there's a workaround available, so I'm going to close this out. PR's are always welcome, but we don't have plans to "support" LIbreElec specifically at this time

@cwayne18 cwayne18 closed this as not planned Won't fix, can't repro, duplicate, stale Jul 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants