Description
openedon Aug 11, 2024
How to categorize this issue?
/area networking
/kind bug
What happened:
When running Cilium as a kube-proxy replacement and the eBPF datapath is chosen (will be introduced with #350) the lo
device will be ignored to search for host addresses https://github.com/cilium/cilium/blob/9d631b91ad4d2c146d3decbfcfc39968764eb539/pkg/datapath/linux/devices.go#L32-L38
Running without a network overlay let's request inside containers against https://kubernetes
time-out.
This currently isn not reproducible when running without overlay because bpf-masquerade get's disabled in that case:
Cilium will fallback to the legacy implementation of hostrouting instead of using the eBPF datapath:
$ kubectl -n kube-system logs ds/cilium
time="2024-08-07T14:01:05Z" level=info msg="BPF host routing requires enable-bpf-masquerade. Falling back to legacy host routing (enable-host-legacy-routing=true)." subsys=daemon
- tcp-dump of cilium managed node (100.83.126.209 is the service IP of kube-apiserver)
shoot--ondemand--test-worker-tyo9o-z2-6f799-cdmnr / # tcpdump -i any | grep 100.83.126.209
tcpdump: data link type LINUX_SLL2
dropped privs to pcap
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes
00:16:59.303015 lxc30002a5304eb In IP 100.64.1.234.35474 > 100.83.126.209.https: Flags [S], seq 3195040413, win 65535, options [mss 8710,sackOK,TS val 3811945852 ecr 0,nop,wscale 9], length 0: Flags [P.], seq 363384
00:16:59.303071 eth0 Out IP 100.64.1.234.35474 > 100.83.126.209.https: Flags [S], seq 3195040413, win 65535, options [mss 8710,sackOK,TS val 3811945852 ecr 0,nop,wscale 9], length 0: Flags [.], ack 362392,
00:17:00.365963 lxc30002a5304eb In IP 100.64.1.234.35474 > 100.83.126.209.https: Flags [S], seq 3195040413, win 65535, options [mss 8710,sackOK,TS val 3811946915 ecr 0,nop,wscale 9], length 092: Flags [.], ack 6785,
- cilium-dbg output
kubectl -n kube-system exec cilium-nbtgj -- cilium-dbg statedb
node-addresses
Defaulted container "cilium-agent" out of: cilium-agent, disable-rp-filter (init), config (init), mount-cgroup (init), apply-sysctl-overwrites (init), mount-bpf-fs (init), clean-cilium-state (init), install-cni-binaries (init)
Address NodePort Primary DeviceName
10.250.2.187 true true eth0
100.64.0.56 false true cilium_host
fe80::b407:4cff:fe39:f6fa false true cilium_host
What you expected to happen:
Pods are able to access the kube-apiserver via service discovery
How to reproduce it (as minimally and precisely as possible):
Create a shoot without overlay and enable the kube-proxy replacement.
Either:
- Add
enable-bpf-masquerade: true
to the cilium-config configmap in kube-system
or
- Install cilium extension using branch of PR fix: enable-bpf-masquerade when snat values are not enabled #350
Example shoot spec to reproduce:
spec:
kubernetes:
kubeProxy:
enabled: false
networking:
type: cilium
providerConfig:
apiVersion: cilium.networking.extensions.gardener.cloud/v1alpha1
kind: NetworkConfig
hubble:
enabled: true
tunnel: disabled
ipv4NativeRoutingCIDREnabled: true
overlay:
enabled: false
createPodRoutes: true
Anything else we need to know?:
Environment:
- Gardener version (if relevant): 1.96
- Extension version: built from source using base branch of PR fix: enable-bpf-masquerade when snat values are not enabled #350
- Kubernetes version (use
kubectl version
): 1.29.16 - Cloud provider or hardware configuration: openstack
- Others: