Skip to content

Can not create container when systemd-resolved not running #11810

Open
@vaclavskala

Description

What happened?

Playbook cluster.yml crash on Kubeadm | Create kubeadm config when systemd-resolved is not running because /run/systemd/resolve/resolv.conf file is missing.

What did you expect to happen?

Kubespray will configure kubelet to use /etc/resolv.conf instead of missing /run/systemd/resolve/resolv.conf,

How can we reproduce it (as minimally and precisely as possible)?

Run cluster.yml on kube nodes running Ubuntu 24.04 with systemd-resolved masked.

OS

Linux 6.1.113-zfs226 x86_64
PRETTY_NAME="Ubuntu 24.04.1 LTS"
NAME="Ubuntu"
VERSION_ID="24.04"
VERSION="24.04.1 LTS (Noble Numbat)"
VERSION_CODENAME=noble
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=noble
LOGO=ubuntu-logo

Version of Ansible

ansible [core 2.16.14]
  config file = /var/home/kubespray-2.26.0/ansible.cfg
  configured module search path = ['/var/home/kubespray-2.26.0/library']
  ansible python module location = /var/home/kubespray-2.26.0/venv/lib/python3.12/site-packages/ansible
  ansible collection location = /var/home/ansible/collections:/usr/share/ansible/collections
  executable location = /var/home/kubespray-2.26.0/venv/bin/ansible
  python version = 3.12.3 (main, Nov  6 2024, 18:32:19) [GCC 13.2.0] (/var/home/kubespray-2.26.0/venv/bin/python)
  jinja version = 3.1.4
  libyaml = True

Version of Python

Python 3.12.3

Version of Kubespray (commit)

kubespray-2.26.0

Network plugin used

calico

Full inventory with variables

Default kubespray-2.26.0 variables

Command used to invoke ansible

ansible-playbook -i inventory/cluster/inventory.ini cluster.yml

Output of ansible run

TASK [kubernetes/control-plane : Kubeadm | Create kubeadm config] **************************************************************************************************************************************************************************
changed: [XXX-prod-master1]
changed: [XXX-prod-master2]
changed: [XXX-prod-master3]
Tuesday 17 December 2024  12:38:03 +0100 (0:00:00.492)       0:07:22.794 ****** 
Tuesday 17 December 2024  12:38:03 +0100 (0:00:00.043)       0:07:22.837 ****** 
Tuesday 17 December 2024  12:38:03 +0100 (0:00:00.047)       0:07:22.885 ****** 
Tuesday 17 December 2024  12:38:03 +0100 (0:00:00.041)       0:07:22.927 ****** 
Tuesday 17 December 2024  12:38:03 +0100 (0:00:00.048)       0:07:22.976 ****** 
Tuesday 17 December 2024  12:38:04 +0100 (0:00:00.090)       0:07:23.067 ****** 
Tuesday 17 December 2024  12:38:04 +0100 (0:00:00.100)       0:07:23.167 ****** 
Tuesday 17 December 2024  12:38:04 +0100 (0:00:00.054)       0:07:23.221 ****** 
Tuesday 17 December 2024  12:38:04 +0100 (0:00:00.044)       0:07:23.266 ****** 
Tuesday 17 December 2024  12:38:04 +0100 (0:00:00.048)       0:07:23.315 ****** 
Tuesday 17 December 2024  12:38:04 +0100 (0:00:00.055)       0:07:23.370 ****** 
FAILED - RETRYING: [XXX-prod-master1]: Kubeadm | Initialize first master (3 retries left).
FAILED - RETRYING: [XXX-prod-master1]: Kubeadm | Initialize first master (2 retries left).
FAILED - RETRYING: [XXX-prod-master1]: Kubeadm | Initialize first master (1 retries left).

Anything else we need to know

Problem is that in roles/kubernetes/preinstall/tasks/main.yml there is detection if systemd-resolved is running but it is only used to detect if include 0060-resolvconf.yml or 0061-systemd-resolved.yml.

But in roles/kubernetes/node/tasks/facts.yml is included OS specific var file from roles/kubernetes/node/vars and in that file resolvconf path is hardcoded for most distributions to /run/systemd/resolve/resolv.conf.
And it cause kubelet fail to create any container.

On control-plane servers this cause kubelet can not create any container with error:
Dec 17 14:01:15 XXX-prod-master1 kubelet[25126]: E1217 14:01:15.267321   25126 dns.go:284] "Could not open resolv conf file." err="open /run/systemd/resolve/resolv.conf: no such file or directory"
Dec 17 14:01:15 XXX-prod-master1 kubelet[25126]: E1217 14:01:15.267332   25126 kuberuntime_sandbox.go:45] "Failed to generate sandbox config for pod" err="open /run/systemd/resolve/resolv.conf: no such file or directory" pod="kube-system/kube-controller-manager-XXX-prod-master1"
Dec 17 14:01:15 XXX-prod-master1 kubelet[25126]: E1217 14:01:15.267342   25126 kuberuntime_manager.go:1166] "CreatePodSandbox for pod failed" err="open /run/systemd/resolve/resolv.conf: no such file or directory" pod="kube-system/kube-controller-manager-XXX-prod-master1"
Dec 17 14:01:15 XXX-prod-master1 kubelet[25126]: E1217 14:01:15.267361   25126 pod_workers.go:1298] "Error syncing pod, skipping" err="failed to \"CreatePodSandbox\" for \"kube-controller-manager-XXX-prod-master1_kube-system(bce3ce42e0aef110c5773ef4027de42c)\" with CreatePodSandboxError: \"Failed to generate sandbox config for pod \\\"kube-controller-manager-XXX-prod-master1_kube-system(bce3ce42e0aef110c5773ef4027de42c)\\\": open /run/systemd/resolve/resolv.conf: no such file or directory\"" pod="kube-system/kube-controller-manager-XXX-prod-master1" podUID="bce3ce42e0aef110c5773ef4027de42c"

When systemd-resolved is not running on worker nodes, any container is stuck in ContainerCreating state with error:

Events:
  Type     Reason                  Age                   From               Message
  ----     ------                  ----                  ----               -------
  Normal   Scheduled               5m52s                 default-scheduler  Successfully assigned kube-system/kube-proxy-6hnnc to XXX-prod-worker2
  Warning  FailedCreatePodSandBox  44s (x26 over 5m52s)  kubelet            Failed to create pod sandbox: open /run/systemd/resolve/resolv.conf: no such file or directory

Metadata

Assignees

No one assigned

    Labels

    kind/bugCategorizes issue or PR as related to a bug.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions