Description
openedon Jul 1, 2024
Before creating an issue, make sure you've checked the following:
- You are running the latest released version of k0s
- Make sure you've searched for existing issues, both open and closed
- Make sure you've searched for PRs too, a fix might've been merged already
- You're looking at docs for the released version, "main" branch docs are usually ahead of released versions.
Platform
Linux 5.14.0-427.20.1.el9_4.x86_64 #1 SMP PREEMPT_DYNAMIC Fri Jun 7 14:51:39 UTC 2024 x86_64 GNU/Linux
NAME="Rocky Linux"
VERSION="9.4 (Blue Onyx)"
ID="rocky"
ID_LIKE="rhel centos fedora"
VERSION_ID="9.4"
PLATFORM_ID="platform:el9"
PRETTY_NAME="Rocky Linux 9.4 (Blue Onyx)"
ANSI_COLOR="0;32"
LOGO="fedora-logo-icon"
CPE_NAME="cpe:/o:rocky:rocky:9::baseos"
HOME_URL="https://rockylinux.org/"
BUG_REPORT_URL="https://bugs.rockylinux.org/"
SUPPORT_END="2032-05-31"
ROCKY_SUPPORT_PRODUCT="Rocky-Linux-9"
ROCKY_SUPPORT_PRODUCT_VERSION="9.4"
REDHAT_SUPPORT_PRODUCT="Rocky Linux"
REDHAT_SUPPORT_PRODUCT_VERSION="9.4"
Version
v1.29.6+k0s.0
Sysinfo
`k0s sysinfo`
Machine ID: "aaa97f0b4a43391c6988e125ca02d88090b8b0cc93034981a82c5901983f06c6" (from machine) (pass) Total memory: 7.5 GiB (pass) Disk space available for /var/lib/k0s: 194.2 GiB (pass) Name resolution: localhost: [::1 127.0.0.1] (pass) Operating system: Linux (pass) Linux kernel release: 5.14.0-427.20.1.el9_4.x86_64 (pass) Max. file descriptors per process: current: 524288 / max: 524288 (pass) AppArmor: unavailable (pass) Executable in PATH: modprobe: /usr/sbin/modprobe (pass) Executable in PATH: mount: /usr/bin/mount (pass) Executable in PATH: umount: /usr/bin/umount (pass) /proc file system: mounted (0x9fa0) (pass) Control Groups: version 2 (pass) cgroup controller "cpu": available (is a listed root controller) (pass) cgroup controller "cpuacct": available (via cpu in version 2) (pass) cgroup controller "cpuset": available (is a listed root controller) (pass) cgroup controller "memory": available (is a listed root controller) (pass) cgroup controller "devices": unknown (warning: insufficient permissions, try with elevated permissions) cgroup controller "freezer": available (cgroup.freeze exists) (pass) cgroup controller "pids": available (is a listed root controller) (pass) cgroup controller "hugetlb": available (is a listed root controller) (pass) cgroup controller "blkio": available (via io in version 2) (pass) CONFIG_CGROUPS: Control Group support: built-in (pass) CONFIG_CGROUP_FREEZER: Freezer cgroup subsystem: built-in (pass) CONFIG_CGROUP_PIDS: PIDs cgroup subsystem: built-in (pass) CONFIG_CGROUP_DEVICE: Device controller for cgroups: built-in (pass) CONFIG_CPUSETS: Cpuset support: built-in (pass) CONFIG_CGROUP_CPUACCT: Simple CPU accounting cgroup subsystem: built-in (pass) CONFIG_MEMCG: Memory Resource Controller for Control Groups: built-in (pass) CONFIG_CGROUP_HUGETLB: HugeTLB Resource Controller for Control Groups: built-in (pass) CONFIG_CGROUP_SCHED: Group CPU scheduler: built-in (pass) CONFIG_FAIR_GROUP_SCHED: Group scheduling for SCHED_OTHER: built-in (pass) CONFIG_CFS_BANDWIDTH: CPU bandwidth provisioning for FAIR_GROUP_SCHED: built-in (pass) CONFIG_BLK_CGROUP: Block IO controller: built-in (pass) CONFIG_NAMESPACES: Namespaces support: built-in (pass) CONFIG_UTS_NS: UTS namespace: built-in (pass) CONFIG_IPC_NS: IPC namespace: built-in (pass) CONFIG_PID_NS: PID namespace: built-in (pass) CONFIG_NET_NS: Network namespace: built-in (pass) CONFIG_NET: Networking support: built-in (pass) CONFIG_INET: TCP/IP networking: built-in (pass) CONFIG_IPV6: The IPv6 protocol: built-in (pass) CONFIG_NETFILTER: Network packet filtering framework (Netfilter): built-in (pass) CONFIG_NETFILTER_ADVANCED: Advanced netfilter configuration: built-in (pass) CONFIG_NF_CONNTRACK: Netfilter connection tracking support: module (pass) CONFIG_NETFILTER_XTABLES: Netfilter Xtables support: built-in (pass) CONFIG_NETFILTER_XT_TARGET_REDIRECT: REDIRECT target support: module (pass) CONFIG_NETFILTER_XT_MATCH_COMMENT: "comment" match support: module (pass) CONFIG_NETFILTER_XT_MARK: nfmark target and match support: module (pass) CONFIG_NETFILTER_XT_SET: set target and match support: module (pass) CONFIG_NETFILTER_XT_TARGET_MASQUERADE: MASQUERADE target support: module (pass) CONFIG_NETFILTER_XT_NAT: "SNAT and DNAT" targets support: module (pass) CONFIG_NETFILTER_XT_MATCH_ADDRTYPE: "addrtype" address type match support: module (pass) CONFIG_NETFILTER_XT_MATCH_CONNTRACK: "conntrack" connection tracking match support: module (pass) CONFIG_NETFILTER_XT_MATCH_MULTIPORT: "multiport" Multiple port match support: module (pass) CONFIG_NETFILTER_XT_MATCH_RECENT: "recent" match support: module (pass) CONFIG_NETFILTER_XT_MATCH_STATISTIC: "statistic" match support: module (pass) CONFIG_NETFILTER_NETLINK: module (pass) CONFIG_NF_NAT: module (pass) CONFIG_IP_SET: IP set support: module (pass) CONFIG_IP_SET_HASH_IP: hash:ip set support: module (pass) CONFIG_IP_SET_HASH_NET: hash:net set support: module (pass) CONFIG_IP_VS: IP virtual server support: module (pass) CONFIG_IP_VS_NFCT: Netfilter connection tracking: built-in (pass) CONFIG_IP_VS_SH: Source hashing scheduling: module (pass) CONFIG_IP_VS_RR: Round-robin scheduling: module (pass) CONFIG_IP_VS_WRR: Weighted round-robin scheduling: module (pass) CONFIG_NF_CONNTRACK_IPV4: IPv4 connetion tracking support (required for NAT): unknown (warning) CONFIG_NF_REJECT_IPV4: IPv4 packet rejection: module (pass) CONFIG_NF_NAT_IPV4: IPv4 NAT: unknown (warning) CONFIG_IP_NF_IPTABLES: IP tables support: module (pass) CONFIG_IP_NF_FILTER: Packet filtering: module (pass) CONFIG_IP_NF_TARGET_REJECT: REJECT target support: module (pass) CONFIG_IP_NF_NAT: iptables NAT support: module (pass) CONFIG_IP_NF_MANGLE: Packet mangling: module (pass) CONFIG_NF_DEFRAG_IPV4: module (pass) CONFIG_NF_CONNTRACK_IPV6: IPv6 connetion tracking support (required for NAT): unknown (warning) CONFIG_NF_NAT_IPV6: IPv6 NAT: unknown (warning) CONFIG_IP6_NF_IPTABLES: IP6 tables support: module (pass) CONFIG_IP6_NF_FILTER: Packet filtering: module (pass) CONFIG_IP6_NF_MANGLE: Packet mangling: module (pass) CONFIG_IP6_NF_NAT: ip6tables NAT support: module (pass) CONFIG_NF_DEFRAG_IPV6: module (pass) CONFIG_BRIDGE: 802.1d Ethernet Bridging: module (pass) CONFIG_LLC: module (pass) CONFIG_STP: module (pass) CONFIG_EXT4_FS: The Extended 4 (ext4) filesystem: module (pass) CONFIG_PROC_FS: /proc file system support: built-in (pass)
What happened?
I created a cluster with calico networking, custom CIDRs, and a minimal helm chart extension to demonstrate the problem (goldpinger).
k0s was installed with sudo /usr/local/bin/k0s install controller -c k0s.yaml --enable-dynamic-config --enable-worker --no-taints
.
After starting and waiting for things to stabilize, a clusterconfig k0s
existed in kube-system and the following pods were running:
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
goldpinger goldpinger-72s8n 1/1 Running 0 76s 10.10.68.67 laverya-ec-rocky.c.replicated-qa.internal <none> <none>
kube-system calico-kube-controllers-84c6cd5b85-s7zqt 1/1 Running 0 119s 10.10.68.66 laverya-ec-rocky.c.replicated-qa.internal <none> <none>
kube-system calico-node-g829x 1/1 Running 0 112s 10.128.0.43 laverya-ec-rocky.c.replicated-qa.internal <none> <none>
kube-system coredns-59d75c6cb5-pl2ct 1/1 Running 0 119s 10.10.68.68 laverya-ec-rocky.c.replicated-qa.internal <none> <none>
kube-system konnectivity-agent-f77m7 1/1 Running 0 112s 10.10.68.69 laverya-ec-rocky.c.replicated-qa.internal <none> <none>
kube-system kube-proxy-kzhgl 1/1 Running 0 112s 10.128.0.43 laverya-ec-rocky.c.replicated-qa.internal <none> <none>
kube-system metrics-server-7556957bb7-k56pw 1/1 Running 0 119s 10.10.68.65 laverya-ec-rocky.c.replicated-qa.internal <none> <none>
I then created a controller token and joined an additional node with sudo /usr/local/bin/k0s install controller --token-file token.txt --enable-worker --no-taints --enable-dynamic-config
.
After the additional controller joined, there is no longer a clusterconfig present, and only the following pods are running:
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
kube-system calico-node-7bcsn 1/1 Running 1 (6m28s ago) 6m31s 10.128.0.43 laverya-ec-rocky.c.replicated-qa.internal <none> <none>
kube-system calico-node-h8l5r 1/1 Running 2 (6m20s ago) 6m31s 10.128.0.50 laverya-ec-rocky-join.c.replicated-qa.internal <none> <none>
kube-system konnectivity-agent-q7prp 1/1 Running 0 6m11s 10.10.68.65 laverya-ec-rocky.c.replicated-qa.internal <none> <none>
kube-system konnectivity-agent-qh9fz 1/1 Running 0 6m39s 10.10.186.128 laverya-ec-rocky-join.c.replicated-qa.internal <none> <none>
kube-system kube-proxy-ffstw 1/1 Running 0 6m50s 10.128.0.50 laverya-ec-rocky-join.c.replicated-qa.internal <none> <none>
kube-system kube-proxy-kzhgl 1/1 Running 0 9m34s 10.128.0.43 laverya-ec-rocky.c.replicated-qa.internal <none> <none>
Goldpinger, coredns, and metrics-server are all no longer present.
Steps to reproduce
- Create a clusterconfig using custom CIDRs and Calico networking
- Install k0s with dynamic-config + enable-worker
- Join an additional controller node
Expected behavior
The node joined as an additional controller and the cluster config was unchanged.
Actual behavior
The node joined as an additional controller, but the clusterconfig was removed / dynamic config was disabled, existing helm charts were removed, and metrics-server/coredns are no longer running.
Screenshots and logs
No response
Additional context
`k0s config`
apiVersion: k0s.k0sproject.io/v1beta1 kind: ClusterConfig metadata: creationTimestamp: null name: k0s spec: api: address: 10.128.0.43 k0sApiPort: 9443 port: 6443 sans: - 10.128.0.43 - fe80::bd4:6989:27d2:4a29 controllerManager: {} extensions: helm: charts: - chartname: okgolove/goldpinger name: goldpinger namespace: goldpinger version: 6.1.2 order: 11 concurrencyLevel: 5 repositories: - name: okgolove url: https://okgolove.github.io/helm-charts/ storage: create_default_storage_class: false type: external_storage installConfig: users: etcdUser: etcd kineUser: kube-apiserver konnectivityUser: konnectivity-server kubeAPIserverUser: kube-apiserver kubeSchedulerUser: kube-scheduler konnectivity: adminPort: 8133 agentPort: 8132 network: calico: null clusterDomain: cluster.local dualStack: {} kubeProxy: iptables: minSyncPeriod: 0s syncPeriod: 0s ipvs: minSyncPeriod: 0s syncPeriod: 0s tcpFinTimeout: 0s tcpTimeout: 0s udpTimeout: 0s metricsBindAddress: 0.0.0.0:10249 mode: iptables nodeLocalLoadBalancing: envoyProxy: apiServerBindPort: 7443 konnectivityServerBindPort: 7132 type: EnvoyProxy podCIDR: 10.10.0.0/16 provider: calico serviceCIDR: 10.11.0.0/16 scheduler: {} storage: etcd: externalCluster: null peerAddress: 10.128.0.43 type: etcd telemetry: enabled: true
The SANs on the certificate used by the joining node are also wrong (demonstrated with openssl s_client -connect 10.11.0.1:443 </dev/null 2>/dev/null | openssl x509 -inform pem -text | grep -A1 "Subject Alternative Name"
), but that will be a different issue.
I have also tested this with 1.30.2, and it has not reproduced there in my testing. I also believe this to be OS-dependent, as I was able to reproduce with Rocky 9 but not Ubuntu.