CCM and K3S/Cilium CIDR mismatch

I am not 100% sure this is a bug, but I observed a weird behavior and can not really pinpoint this to either CCM/cilium or the K3S cluster where both are running in.

# Setup
A K3S cluster (v1.30.2+k3s2) with cilium as network layer (1.17.3) and Hetzner CCM (1.24.0). Networking is configured exactly like described  in https://github.com/hetznercloud/hcloud-cloud-controller-manager/blob/main/docs/deploy_with_networks.md with and native routing mode is active.

**cilium values.yml**

```
operator:
  replicas: 2
routingMode: native
k8sServiceHost: prod-k3s.xxx.de
k8sServicePort: 6443
ipv4NativeRoutingCIDR: 10.0.0.0/8
ipam:
  operator:
    clusterPoolIPv4PodCIDRList: 10.0.16.0/20
autoDirectNodeRoutes: true
kubeProxyReplacement: true
directRoutingSkipUnreachable: true
nodePort:
    enabled: true # https://docs.cilium.io/en/latest/network/servicemesh/ingress/#prerequisites
ingressController:
  enabled: true
  loadbalancerMode: shared
  enableProxyProtocol: true
  hostNetwork:
    enabled: true # https://docs.cilium.io/en/latest/network/servicemesh/ingress/#gs-ingress-host-network-mode
    sharedListenerPort: 8080
  externalTrafficPolicy: Local
  service:
    externalTrafficPolicy: null
    type: ClusterIP
  loadBalancer:
    l7:
      backend: envoy
```


# Issue
The cluster was working fine, networking working as expected. During scale up to add another node, cilium suddenly started complaining that the cluster health was degraded:

`cilium-dbg status`

```
[...]
Cluster health:          5/6 reachable
[...]
```

a verbose status query showed that though the server node was fine, the endpoint within the nodes pod CIDR was not reachable

`cilium-dbg status --verbose`

```
Name              IP              Node   Endpoints
  prod-xxx-agent-0 (localhost):
    Host connectivity to 10.0.1.10:
      ICMP to stack:   OK, RTT=124.186µs
      HTTP to agent:   OK, RTT=357.775µs
    Endpoint connectivity to 10.0.21.142:
      ICMP to stack:   OK, RTT=309.803µs
      HTTP to agent:   OK, RTT=310.313µs
  prod-xxx-k3s-agent-1:
    Host connectivity to 10.0.1.11:
      ICMP to stack:   OK, RTT=2.2799ms
      HTTP to agent:   OK, RTT=852.765µs
    Endpoint connectivity to 10.0.22.3:
      ICMP to stack:   OK, RTT=2.359048ms
      HTTP to agent:   OK, RTT=980.736µs
  prod-xxx-k3s-agent-2:
    Host connectivity to 10.0.1.12:
      ICMP to stack:   OK, RTT=3.352002ms
      HTTP to agent:   OK, RTT=933.567µs
    Endpoint connectivity to 10.0.25.241:
      ICMP to stack:   ERROR (exact message not recorded)
      HTTP to agent:   ERROR (exact message not recorded)

[...]
```

looking at the the log of the cloud controller I can see there is a mismatch of the nodes CIDR

```
I0428 09:24:29.280171       1 route_controller.go:214] action for Node "prod-xxx-k3s-agent-0" with CIDR "10.0.21.0/24": "keep"
I0428 09:24:29.280211       1 route_controller.go:214] action for Node "prod-xxx-k3s-agent-1" with CIDR "10.0.22.0/24": "keep"
I0428 09:24:29.280222       1 route_controller.go:214] action for Node "prod-xxx-k3s-agent-2" with CIDR "10.0.19.0/24": "keep"
I0428 09:24:29.280232       1 route_controller.go:214] action for Node "prod-xxx-k3s-server-0" with CIDR "10.0.16.0/24": "keep"
I0428 09:24:29.280242       1 route_controller.go:214] action for Node "prod-xxx-k3s-server-1" with CIDR "10.0.17.0/24": "keep"
I0428 09:24:29.280251       1 route_controller.go:214] action for Node "prod-xxx-k3s-server-2" with CIDR "10.0.18.0/24": "keep"
```

so the culprit seemed to be the mismatch, the Hetzner CCM thinks node `prod-xxx-k3s-agent-2` has the pod CIDR `10.0.19.0/24` while cilium expects it to be `10.0.25.0/24`. This is confirmed by a look at the `CiliumNode` config

```
apiVersion: cilium.io/v2
kind: CiliumNode
metadata:
  creationTimestamp: "2025-04-24T22:46:27Z"
  generation: 18
  labels:
    beta.kubernetes.io/arch: amd64
    beta.kubernetes.io/os: linux
    csi.hetzner.cloud/location: nbg1
    kubernetes.io/arch: amd64
    kubernetes.io/hostname: prod-xxx-k3s-agent-2
    kubernetes.io/os: linux
  name: prod-xxx-k3s-agent-2
  ownerReferences:
  - apiVersion: v1
    kind: Node
    name: prod-xxx-k3s-agent-2
    uid: 8ac44d80-1509-4bd4-96a0-b89ae7bd37fc
  resourceVersion: "25412434"
  uid: a87adfb9-d50f-4213-85f7-3dcb8e33a305
spec:
  addresses:
  - ip: 10.0.1.12
    type: InternalIP
  - ip: 10.0.25.64
    type: CiliumInternalIP
  alibaba-cloud: {}
  azure: {}
  bootid: 152ec818-1c18-488f-9bcd-9c3b320b0d21
  encryption: {}
  eni: {}
  health:
    ipv4: 10.0.25.241
  ingress:
    ipv4: 10.0.25.17
  ipam:
    podCIDRs:
    - 10.0.25.0/24
    pools: {}
status:
  alibaba-cloud: {}
  azure: {}
  eni: {}
  ipam:
    operator-status: {}
```


I was able to mitigate the issue by manually editing the `CiliumNode` but am still on the hunt of the root cause.

I would appreciate any hints on where to look for the bug or how this mismatch can happen? 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CCM and K3S/Cilium CIDR mismatch #915

Setup

Issue

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

CCM and K3S/Cilium CIDR mismatch #915

Description

Setup

Issue

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions