Joining additional controller node with dynamic config enabled wipes existing clusterconfig

### Before creating an issue, make sure you've checked the following:

- [X] You are running the latest released version of k0s
- [X] Make sure you've searched for existing issues, both open and closed
- [X] Make sure you've searched for PRs too, a fix might've been merged already
- [X] You're looking at docs for the released version, "main" branch docs are usually ahead of released versions.

### Platform

```shell
Linux 5.14.0-427.20.1.el9_4.x86_64 #1 SMP PREEMPT_DYNAMIC Fri Jun 7 14:51:39 UTC 2024 x86_64 GNU/Linux
NAME="Rocky Linux"
VERSION="9.4 (Blue Onyx)"
ID="rocky"
ID_LIKE="rhel centos fedora"
VERSION_ID="9.4"
PLATFORM_ID="platform:el9"
PRETTY_NAME="Rocky Linux 9.4 (Blue Onyx)"
ANSI_COLOR="0;32"
LOGO="fedora-logo-icon"
CPE_NAME="cpe:/o:rocky:rocky:9::baseos"
HOME_URL="https://rockylinux.org/"
BUG_REPORT_URL="https://bugs.rockylinux.org/"
SUPPORT_END="2032-05-31"
ROCKY_SUPPORT_PRODUCT="Rocky-Linux-9"
ROCKY_SUPPORT_PRODUCT_VERSION="9.4"
REDHAT_SUPPORT_PRODUCT="Rocky Linux"
REDHAT_SUPPORT_PRODUCT_VERSION="9.4"
```


### Version

v1.29.6+k0s.0

### Sysinfo

<details><summary>`k0s sysinfo`</summary>
<pre>
Machine ID: "aaa97f0b4a43391c6988e125ca02d88090b8b0cc93034981a82c5901983f06c6" (from machine) (pass)
Total memory: 7.5 GiB (pass)
Disk space available for /var/lib/k0s: 194.2 GiB (pass)
Name resolution: localhost: [::1 127.0.0.1] (pass)
Operating system: Linux (pass)
  Linux kernel release: 5.14.0-427.20.1.el9_4.x86_64 (pass)
  Max. file descriptors per process: current: 524288 / max: 524288 (pass)
  AppArmor: unavailable (pass)
  Executable in PATH: modprobe: /usr/sbin/modprobe (pass)
  Executable in PATH: mount: /usr/bin/mount (pass)
  Executable in PATH: umount: /usr/bin/umount (pass)
  /proc file system: mounted (0x9fa0) (pass)
  Control Groups: version 2 (pass)
    cgroup controller "cpu": available (is a listed root controller) (pass)
    cgroup controller "cpuacct": available (via cpu in version 2) (pass)
    cgroup controller "cpuset": available (is a listed root controller) (pass)
    cgroup controller "memory": available (is a listed root controller) (pass)
    cgroup controller "devices": unknown (warning: insufficient permissions, try with elevated permissions)
    cgroup controller "freezer": available (cgroup.freeze exists) (pass)
    cgroup controller "pids": available (is a listed root controller) (pass)
    cgroup controller "hugetlb": available (is a listed root controller) (pass)
    cgroup controller "blkio": available (via io in version 2) (pass)
  CONFIG_CGROUPS: Control Group support: built-in (pass)
    CONFIG_CGROUP_FREEZER: Freezer cgroup subsystem: built-in (pass)
    CONFIG_CGROUP_PIDS: PIDs cgroup subsystem: built-in (pass)
    CONFIG_CGROUP_DEVICE: Device controller for cgroups: built-in (pass)
    CONFIG_CPUSETS: Cpuset support: built-in (pass)
    CONFIG_CGROUP_CPUACCT: Simple CPU accounting cgroup subsystem: built-in (pass)
    CONFIG_MEMCG: Memory Resource Controller for Control Groups: built-in (pass)
    CONFIG_CGROUP_HUGETLB: HugeTLB Resource Controller for Control Groups: built-in (pass)
    CONFIG_CGROUP_SCHED: Group CPU scheduler: built-in (pass)
      CONFIG_FAIR_GROUP_SCHED: Group scheduling for SCHED_OTHER: built-in (pass)
        CONFIG_CFS_BANDWIDTH: CPU bandwidth provisioning for FAIR_GROUP_SCHED: built-in (pass)
    CONFIG_BLK_CGROUP: Block IO controller: built-in (pass)
  CONFIG_NAMESPACES: Namespaces support: built-in (pass)
    CONFIG_UTS_NS: UTS namespace: built-in (pass)
    CONFIG_IPC_NS: IPC namespace: built-in (pass)
    CONFIG_PID_NS: PID namespace: built-in (pass)
    CONFIG_NET_NS: Network namespace: built-in (pass)
  CONFIG_NET: Networking support: built-in (pass)
    CONFIG_INET: TCP/IP networking: built-in (pass)
      CONFIG_IPV6: The IPv6 protocol: built-in (pass)
    CONFIG_NETFILTER: Network packet filtering framework (Netfilter): built-in (pass)
      CONFIG_NETFILTER_ADVANCED: Advanced netfilter configuration: built-in (pass)
      CONFIG_NF_CONNTRACK: Netfilter connection tracking support: module (pass)
      CONFIG_NETFILTER_XTABLES: Netfilter Xtables support: built-in (pass)
        CONFIG_NETFILTER_XT_TARGET_REDIRECT: REDIRECT target support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_COMMENT: "comment" match support: module (pass)
        CONFIG_NETFILTER_XT_MARK: nfmark target and match support: module (pass)
        CONFIG_NETFILTER_XT_SET: set target and match support: module (pass)
        CONFIG_NETFILTER_XT_TARGET_MASQUERADE: MASQUERADE target support: module (pass)
        CONFIG_NETFILTER_XT_NAT: "SNAT and DNAT" targets support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_ADDRTYPE: "addrtype" address type match support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_CONNTRACK: "conntrack" connection tracking match support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_MULTIPORT: "multiport" Multiple port match support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_RECENT: "recent" match support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_STATISTIC: "statistic" match support: module (pass)
      CONFIG_NETFILTER_NETLINK: module (pass)
      CONFIG_NF_NAT: module (pass)
      CONFIG_IP_SET: IP set support: module (pass)
        CONFIG_IP_SET_HASH_IP: hash:ip set support: module (pass)
        CONFIG_IP_SET_HASH_NET: hash:net set support: module (pass)
      CONFIG_IP_VS: IP virtual server support: module (pass)
        CONFIG_IP_VS_NFCT: Netfilter connection tracking: built-in (pass)
        CONFIG_IP_VS_SH: Source hashing scheduling: module (pass)
        CONFIG_IP_VS_RR: Round-robin scheduling: module (pass)
        CONFIG_IP_VS_WRR: Weighted round-robin scheduling: module (pass)
      CONFIG_NF_CONNTRACK_IPV4: IPv4 connetion tracking support (required for NAT): unknown (warning)
      CONFIG_NF_REJECT_IPV4: IPv4 packet rejection: module (pass)
      CONFIG_NF_NAT_IPV4: IPv4 NAT: unknown (warning)
      CONFIG_IP_NF_IPTABLES: IP tables support: module (pass)
        CONFIG_IP_NF_FILTER: Packet filtering: module (pass)
          CONFIG_IP_NF_TARGET_REJECT: REJECT target support: module (pass)
        CONFIG_IP_NF_NAT: iptables NAT support: module (pass)
        CONFIG_IP_NF_MANGLE: Packet mangling: module (pass)
      CONFIG_NF_DEFRAG_IPV4: module (pass)
      CONFIG_NF_CONNTRACK_IPV6: IPv6 connetion tracking support (required for NAT): unknown (warning)
      CONFIG_NF_NAT_IPV6: IPv6 NAT: unknown (warning)
      CONFIG_IP6_NF_IPTABLES: IP6 tables support: module (pass)
        CONFIG_IP6_NF_FILTER: Packet filtering: module (pass)
        CONFIG_IP6_NF_MANGLE: Packet mangling: module (pass)
        CONFIG_IP6_NF_NAT: ip6tables NAT support: module (pass)
      CONFIG_NF_DEFRAG_IPV6: module (pass)
    CONFIG_BRIDGE: 802.1d Ethernet Bridging: module (pass)
      CONFIG_LLC: module (pass)
      CONFIG_STP: module (pass)
  CONFIG_EXT4_FS: The Extended 4 (ext4) filesystem: module (pass)
  CONFIG_PROC_FS: /proc file system support: built-in (pass)
</pre>
</details>


### What happened?

I created a cluster with calico networking, custom CIDRs, and a minimal helm chart extension to demonstrate the problem (goldpinger).

k0s was installed with `sudo /usr/local/bin/k0s install controller -c k0s.yaml --enable-dynamic-config --enable-worker --no-taints`.

After starting and waiting for things to stabilize, a clusterconfig `k0s` existed in kube-system and the following pods were running:
```
NAMESPACE     NAME                                       READY   STATUS    RESTARTS   AGE    IP            NODE                                        NOMINATED NODE   READINESS GATES
goldpinger    goldpinger-72s8n                           1/1     Running   0          76s    10.10.68.67   laverya-ec-rocky.c.replicated-qa.internal   <none>           <none>
kube-system   calico-kube-controllers-84c6cd5b85-s7zqt   1/1     Running   0          119s   10.10.68.66   laverya-ec-rocky.c.replicated-qa.internal   <none>           <none>
kube-system   calico-node-g829x                          1/1     Running   0          112s   10.128.0.43   laverya-ec-rocky.c.replicated-qa.internal   <none>           <none>
kube-system   coredns-59d75c6cb5-pl2ct                   1/1     Running   0          119s   10.10.68.68   laverya-ec-rocky.c.replicated-qa.internal   <none>           <none>
kube-system   konnectivity-agent-f77m7                   1/1     Running   0          112s   10.10.68.69   laverya-ec-rocky.c.replicated-qa.internal   <none>           <none>
kube-system   kube-proxy-kzhgl                           1/1     Running   0          112s   10.128.0.43   laverya-ec-rocky.c.replicated-qa.internal   <none>           <none>
kube-system   metrics-server-7556957bb7-k56pw            1/1     Running   0          119s   10.10.68.65   laverya-ec-rocky.c.replicated-qa.internal   <none>           <none>
```

I then created a controller token and joined an additional node with `sudo /usr/local/bin/k0s install controller --token-file token.txt --enable-worker --no-taints --enable-dynamic-config`.

After the additional controller joined, there is no longer a clusterconfig present, and only the following pods are running:
```
NAMESPACE     NAME                       READY   STATUS    RESTARTS        AGE     IP              NODE                                             NOMINATED NODE   READINESS GATES
kube-system   calico-node-7bcsn          1/1     Running   1 (6m28s ago)   6m31s   10.128.0.43     laverya-ec-rocky.c.replicated-qa.internal        <none>           <none>
kube-system   calico-node-h8l5r          1/1     Running   2 (6m20s ago)   6m31s   10.128.0.50     laverya-ec-rocky-join.c.replicated-qa.internal   <none>           <none>
kube-system   konnectivity-agent-q7prp   1/1     Running   0               6m11s   10.10.68.65     laverya-ec-rocky.c.replicated-qa.internal        <none>           <none>
kube-system   konnectivity-agent-qh9fz   1/1     Running   0               6m39s   10.10.186.128   laverya-ec-rocky-join.c.replicated-qa.internal   <none>           <none>
kube-system   kube-proxy-ffstw           1/1     Running   0               6m50s   10.128.0.50     laverya-ec-rocky-join.c.replicated-qa.internal   <none>           <none>
kube-system   kube-proxy-kzhgl           1/1     Running   0               9m34s   10.128.0.43     laverya-ec-rocky.c.replicated-qa.internal        <none>           <none>
```

Goldpinger, coredns, and metrics-server are all no longer present.

### Steps to reproduce

1. Create a clusterconfig using custom CIDRs and Calico networking
2. Install k0s with dynamic-config + enable-worker
3. Join an additional controller node


### Expected behavior

The node joined as an additional controller and the cluster config was unchanged.

### Actual behavior

The node joined as an additional controller, but the clusterconfig was removed / dynamic config was disabled, existing helm charts were removed, and metrics-server/coredns are no longer running.

### Screenshots and logs

_No response_

### Additional context

<details><summary>`k0s config`</summary>
<pre>
apiVersion: k0s.k0sproject.io/v1beta1
kind: ClusterConfig
metadata:
  creationTimestamp: null
  name: k0s
spec:
  api:
    address: 10.128.0.43
    k0sApiPort: 9443
    port: 6443
    sans:
    - 10.128.0.43
    - fe80::bd4:6989:27d2:4a29
  controllerManager: {}
  extensions:
    helm:
      charts:
        - chartname: okgolove/goldpinger
          name: goldpinger
          namespace: goldpinger
          version: 6.1.2
          order: 11
      concurrencyLevel: 5
      repositories:
      - name: okgolove
        url: https://okgolove.github.io/helm-charts/
    storage:
      create_default_storage_class: false
      type: external_storage
  installConfig:
    users:
      etcdUser: etcd
      kineUser: kube-apiserver
      konnectivityUser: konnectivity-server
      kubeAPIserverUser: kube-apiserver
      kubeSchedulerUser: kube-scheduler
  konnectivity:
    adminPort: 8133
    agentPort: 8132
  network:
    calico: null
    clusterDomain: cluster.local
    dualStack: {}
    kubeProxy:
      iptables:
        minSyncPeriod: 0s
        syncPeriod: 0s
      ipvs:
        minSyncPeriod: 0s
        syncPeriod: 0s
        tcpFinTimeout: 0s
        tcpTimeout: 0s
        udpTimeout: 0s
      metricsBindAddress: 0.0.0.0:10249
      mode: iptables
    nodeLocalLoadBalancing:
      envoyProxy:
        apiServerBindPort: 7443
        konnectivityServerBindPort: 7132
      type: EnvoyProxy
    podCIDR: 10.10.0.0/16
    provider: calico
    serviceCIDR: 10.11.0.0/16
  scheduler: {}
  storage:
    etcd:
      externalCluster: null
      peerAddress: 10.128.0.43
    type: etcd
  telemetry:
    enabled: true
</pre>
</details>

The SANs on the certificate used by the joining node are also wrong (demonstrated with `openssl s_client -connect 10.11.0.1:443 </dev/null 2>/dev/null | openssl x509 -inform pem -text | grep -A1 "Subject Alternative Name"`), but that will be a different issue.

I have also tested this with 1.30.2, and it has not reproduced there in my testing. I also believe this to be OS-dependent, as I was able to reproduce with Rocky 9 but not Ubuntu.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Joining additional controller node with dynamic config enabled wipes existing clusterconfig #4702

laverya
openedon Jul 1, 2024

Before creating an issue, make sure you've checked the following:

Platform

Version

Sysinfo

What happened?

Steps to reproduce

Expected behavior

Actual behavior

Screenshots and logs

Additional context

Assignees

Labels

Type

Projects

Milestone

Relationships

Development