Skip to content

Joining additional controller node with dynamic config enabled wipes existing clusterconfig #4702

Open

Description

Before creating an issue, make sure you've checked the following:

  • You are running the latest released version of k0s
  • Make sure you've searched for existing issues, both open and closed
  • Make sure you've searched for PRs too, a fix might've been merged already
  • You're looking at docs for the released version, "main" branch docs are usually ahead of released versions.

Platform

Linux 5.14.0-427.20.1.el9_4.x86_64 #1 SMP PREEMPT_DYNAMIC Fri Jun 7 14:51:39 UTC 2024 x86_64 GNU/Linux
NAME="Rocky Linux"
VERSION="9.4 (Blue Onyx)"
ID="rocky"
ID_LIKE="rhel centos fedora"
VERSION_ID="9.4"
PLATFORM_ID="platform:el9"
PRETTY_NAME="Rocky Linux 9.4 (Blue Onyx)"
ANSI_COLOR="0;32"
LOGO="fedora-logo-icon"
CPE_NAME="cpe:/o:rocky:rocky:9::baseos"
HOME_URL="https://rockylinux.org/"
BUG_REPORT_URL="https://bugs.rockylinux.org/"
SUPPORT_END="2032-05-31"
ROCKY_SUPPORT_PRODUCT="Rocky-Linux-9"
ROCKY_SUPPORT_PRODUCT_VERSION="9.4"
REDHAT_SUPPORT_PRODUCT="Rocky Linux"
REDHAT_SUPPORT_PRODUCT_VERSION="9.4"

Version

v1.29.6+k0s.0

Sysinfo

`k0s sysinfo`
Machine ID: "aaa97f0b4a43391c6988e125ca02d88090b8b0cc93034981a82c5901983f06c6" (from machine) (pass)
Total memory: 7.5 GiB (pass)
Disk space available for /var/lib/k0s: 194.2 GiB (pass)
Name resolution: localhost: [::1 127.0.0.1] (pass)
Operating system: Linux (pass)
  Linux kernel release: 5.14.0-427.20.1.el9_4.x86_64 (pass)
  Max. file descriptors per process: current: 524288 / max: 524288 (pass)
  AppArmor: unavailable (pass)
  Executable in PATH: modprobe: /usr/sbin/modprobe (pass)
  Executable in PATH: mount: /usr/bin/mount (pass)
  Executable in PATH: umount: /usr/bin/umount (pass)
  /proc file system: mounted (0x9fa0) (pass)
  Control Groups: version 2 (pass)
    cgroup controller "cpu": available (is a listed root controller) (pass)
    cgroup controller "cpuacct": available (via cpu in version 2) (pass)
    cgroup controller "cpuset": available (is a listed root controller) (pass)
    cgroup controller "memory": available (is a listed root controller) (pass)
    cgroup controller "devices": unknown (warning: insufficient permissions, try with elevated permissions)
    cgroup controller "freezer": available (cgroup.freeze exists) (pass)
    cgroup controller "pids": available (is a listed root controller) (pass)
    cgroup controller "hugetlb": available (is a listed root controller) (pass)
    cgroup controller "blkio": available (via io in version 2) (pass)
  CONFIG_CGROUPS: Control Group support: built-in (pass)
    CONFIG_CGROUP_FREEZER: Freezer cgroup subsystem: built-in (pass)
    CONFIG_CGROUP_PIDS: PIDs cgroup subsystem: built-in (pass)
    CONFIG_CGROUP_DEVICE: Device controller for cgroups: built-in (pass)
    CONFIG_CPUSETS: Cpuset support: built-in (pass)
    CONFIG_CGROUP_CPUACCT: Simple CPU accounting cgroup subsystem: built-in (pass)
    CONFIG_MEMCG: Memory Resource Controller for Control Groups: built-in (pass)
    CONFIG_CGROUP_HUGETLB: HugeTLB Resource Controller for Control Groups: built-in (pass)
    CONFIG_CGROUP_SCHED: Group CPU scheduler: built-in (pass)
      CONFIG_FAIR_GROUP_SCHED: Group scheduling for SCHED_OTHER: built-in (pass)
        CONFIG_CFS_BANDWIDTH: CPU bandwidth provisioning for FAIR_GROUP_SCHED: built-in (pass)
    CONFIG_BLK_CGROUP: Block IO controller: built-in (pass)
  CONFIG_NAMESPACES: Namespaces support: built-in (pass)
    CONFIG_UTS_NS: UTS namespace: built-in (pass)
    CONFIG_IPC_NS: IPC namespace: built-in (pass)
    CONFIG_PID_NS: PID namespace: built-in (pass)
    CONFIG_NET_NS: Network namespace: built-in (pass)
  CONFIG_NET: Networking support: built-in (pass)
    CONFIG_INET: TCP/IP networking: built-in (pass)
      CONFIG_IPV6: The IPv6 protocol: built-in (pass)
    CONFIG_NETFILTER: Network packet filtering framework (Netfilter): built-in (pass)
      CONFIG_NETFILTER_ADVANCED: Advanced netfilter configuration: built-in (pass)
      CONFIG_NF_CONNTRACK: Netfilter connection tracking support: module (pass)
      CONFIG_NETFILTER_XTABLES: Netfilter Xtables support: built-in (pass)
        CONFIG_NETFILTER_XT_TARGET_REDIRECT: REDIRECT target support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_COMMENT: "comment" match support: module (pass)
        CONFIG_NETFILTER_XT_MARK: nfmark target and match support: module (pass)
        CONFIG_NETFILTER_XT_SET: set target and match support: module (pass)
        CONFIG_NETFILTER_XT_TARGET_MASQUERADE: MASQUERADE target support: module (pass)
        CONFIG_NETFILTER_XT_NAT: "SNAT and DNAT" targets support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_ADDRTYPE: "addrtype" address type match support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_CONNTRACK: "conntrack" connection tracking match support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_MULTIPORT: "multiport" Multiple port match support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_RECENT: "recent" match support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_STATISTIC: "statistic" match support: module (pass)
      CONFIG_NETFILTER_NETLINK: module (pass)
      CONFIG_NF_NAT: module (pass)
      CONFIG_IP_SET: IP set support: module (pass)
        CONFIG_IP_SET_HASH_IP: hash:ip set support: module (pass)
        CONFIG_IP_SET_HASH_NET: hash:net set support: module (pass)
      CONFIG_IP_VS: IP virtual server support: module (pass)
        CONFIG_IP_VS_NFCT: Netfilter connection tracking: built-in (pass)
        CONFIG_IP_VS_SH: Source hashing scheduling: module (pass)
        CONFIG_IP_VS_RR: Round-robin scheduling: module (pass)
        CONFIG_IP_VS_WRR: Weighted round-robin scheduling: module (pass)
      CONFIG_NF_CONNTRACK_IPV4: IPv4 connetion tracking support (required for NAT): unknown (warning)
      CONFIG_NF_REJECT_IPV4: IPv4 packet rejection: module (pass)
      CONFIG_NF_NAT_IPV4: IPv4 NAT: unknown (warning)
      CONFIG_IP_NF_IPTABLES: IP tables support: module (pass)
        CONFIG_IP_NF_FILTER: Packet filtering: module (pass)
          CONFIG_IP_NF_TARGET_REJECT: REJECT target support: module (pass)
        CONFIG_IP_NF_NAT: iptables NAT support: module (pass)
        CONFIG_IP_NF_MANGLE: Packet mangling: module (pass)
      CONFIG_NF_DEFRAG_IPV4: module (pass)
      CONFIG_NF_CONNTRACK_IPV6: IPv6 connetion tracking support (required for NAT): unknown (warning)
      CONFIG_NF_NAT_IPV6: IPv6 NAT: unknown (warning)
      CONFIG_IP6_NF_IPTABLES: IP6 tables support: module (pass)
        CONFIG_IP6_NF_FILTER: Packet filtering: module (pass)
        CONFIG_IP6_NF_MANGLE: Packet mangling: module (pass)
        CONFIG_IP6_NF_NAT: ip6tables NAT support: module (pass)
      CONFIG_NF_DEFRAG_IPV6: module (pass)
    CONFIG_BRIDGE: 802.1d Ethernet Bridging: module (pass)
      CONFIG_LLC: module (pass)
      CONFIG_STP: module (pass)
  CONFIG_EXT4_FS: The Extended 4 (ext4) filesystem: module (pass)
  CONFIG_PROC_FS: /proc file system support: built-in (pass)

What happened?

I created a cluster with calico networking, custom CIDRs, and a minimal helm chart extension to demonstrate the problem (goldpinger).

k0s was installed with sudo /usr/local/bin/k0s install controller -c k0s.yaml --enable-dynamic-config --enable-worker --no-taints.

After starting and waiting for things to stabilize, a clusterconfig k0s existed in kube-system and the following pods were running:

NAMESPACE     NAME                                       READY   STATUS    RESTARTS   AGE    IP            NODE                                        NOMINATED NODE   READINESS GATES
goldpinger    goldpinger-72s8n                           1/1     Running   0          76s    10.10.68.67   laverya-ec-rocky.c.replicated-qa.internal   <none>           <none>
kube-system   calico-kube-controllers-84c6cd5b85-s7zqt   1/1     Running   0          119s   10.10.68.66   laverya-ec-rocky.c.replicated-qa.internal   <none>           <none>
kube-system   calico-node-g829x                          1/1     Running   0          112s   10.128.0.43   laverya-ec-rocky.c.replicated-qa.internal   <none>           <none>
kube-system   coredns-59d75c6cb5-pl2ct                   1/1     Running   0          119s   10.10.68.68   laverya-ec-rocky.c.replicated-qa.internal   <none>           <none>
kube-system   konnectivity-agent-f77m7                   1/1     Running   0          112s   10.10.68.69   laverya-ec-rocky.c.replicated-qa.internal   <none>           <none>
kube-system   kube-proxy-kzhgl                           1/1     Running   0          112s   10.128.0.43   laverya-ec-rocky.c.replicated-qa.internal   <none>           <none>
kube-system   metrics-server-7556957bb7-k56pw            1/1     Running   0          119s   10.10.68.65   laverya-ec-rocky.c.replicated-qa.internal   <none>           <none>

I then created a controller token and joined an additional node with sudo /usr/local/bin/k0s install controller --token-file token.txt --enable-worker --no-taints --enable-dynamic-config.

After the additional controller joined, there is no longer a clusterconfig present, and only the following pods are running:

NAMESPACE     NAME                       READY   STATUS    RESTARTS        AGE     IP              NODE                                             NOMINATED NODE   READINESS GATES
kube-system   calico-node-7bcsn          1/1     Running   1 (6m28s ago)   6m31s   10.128.0.43     laverya-ec-rocky.c.replicated-qa.internal        <none>           <none>
kube-system   calico-node-h8l5r          1/1     Running   2 (6m20s ago)   6m31s   10.128.0.50     laverya-ec-rocky-join.c.replicated-qa.internal   <none>           <none>
kube-system   konnectivity-agent-q7prp   1/1     Running   0               6m11s   10.10.68.65     laverya-ec-rocky.c.replicated-qa.internal        <none>           <none>
kube-system   konnectivity-agent-qh9fz   1/1     Running   0               6m39s   10.10.186.128   laverya-ec-rocky-join.c.replicated-qa.internal   <none>           <none>
kube-system   kube-proxy-ffstw           1/1     Running   0               6m50s   10.128.0.50     laverya-ec-rocky-join.c.replicated-qa.internal   <none>           <none>
kube-system   kube-proxy-kzhgl           1/1     Running   0               9m34s   10.128.0.43     laverya-ec-rocky.c.replicated-qa.internal        <none>           <none>

Goldpinger, coredns, and metrics-server are all no longer present.

Steps to reproduce

  1. Create a clusterconfig using custom CIDRs and Calico networking
  2. Install k0s with dynamic-config + enable-worker
  3. Join an additional controller node

Expected behavior

The node joined as an additional controller and the cluster config was unchanged.

Actual behavior

The node joined as an additional controller, but the clusterconfig was removed / dynamic config was disabled, existing helm charts were removed, and metrics-server/coredns are no longer running.

Screenshots and logs

No response

Additional context

`k0s config`
apiVersion: k0s.k0sproject.io/v1beta1
kind: ClusterConfig
metadata:
  creationTimestamp: null
  name: k0s
spec:
  api:
    address: 10.128.0.43
    k0sApiPort: 9443
    port: 6443
    sans:
    - 10.128.0.43
    - fe80::bd4:6989:27d2:4a29
  controllerManager: {}
  extensions:
    helm:
      charts:
        - chartname: okgolove/goldpinger
          name: goldpinger
          namespace: goldpinger
          version: 6.1.2
          order: 11
      concurrencyLevel: 5
      repositories:
      - name: okgolove
        url: https://okgolove.github.io/helm-charts/
    storage:
      create_default_storage_class: false
      type: external_storage
  installConfig:
    users:
      etcdUser: etcd
      kineUser: kube-apiserver
      konnectivityUser: konnectivity-server
      kubeAPIserverUser: kube-apiserver
      kubeSchedulerUser: kube-scheduler
  konnectivity:
    adminPort: 8133
    agentPort: 8132
  network:
    calico: null
    clusterDomain: cluster.local
    dualStack: {}
    kubeProxy:
      iptables:
        minSyncPeriod: 0s
        syncPeriod: 0s
      ipvs:
        minSyncPeriod: 0s
        syncPeriod: 0s
        tcpFinTimeout: 0s
        tcpTimeout: 0s
        udpTimeout: 0s
      metricsBindAddress: 0.0.0.0:10249
      mode: iptables
    nodeLocalLoadBalancing:
      envoyProxy:
        apiServerBindPort: 7443
        konnectivityServerBindPort: 7132
      type: EnvoyProxy
    podCIDR: 10.10.0.0/16
    provider: calico
    serviceCIDR: 10.11.0.0/16
  scheduler: {}
  storage:
    etcd:
      externalCluster: null
      peerAddress: 10.128.0.43
    type: etcd
  telemetry:
    enabled: true

The SANs on the certificate used by the joining node are also wrong (demonstrated with openssl s_client -connect 10.11.0.1:443 </dev/null 2>/dev/null | openssl x509 -inform pem -text | grep -A1 "Subject Alternative Name"), but that will be a different issue.

I have also tested this with 1.30.2, and it has not reproduced there in my testing. I also believe this to be OS-dependent, as I was able to reproduce with Rocky 9 but not Ubuntu.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions