K8 Node Transport Interface Detection #6273

rajnkamr · 2024-04-29T06:18:52Z

Describe what you are trying to solve

A node can have multiple interfaces, packets can be routed differently due to iptable rules. However, It is important to find the actual transport interface for Egress traffic on the node.
Describe the solution you have in mind

Node has two network interfaces: eth0 connected to Network A and eth1 connected to Network B. Additionally, there's a custom iptables rule that marks packets with a specific packet mark when they are destined for a particular IP address range.

Here's a simplified representation of the scenario:

Network A: 192.168.1.0/24
Network B: 10.0.0.0/24
Custom iptables rule: Mark packets destined for IP addresses in the range 10.0.0.0/24 with a specific packet mark.
Now, let's say we want to determine the outgoing interface for a packet destined for an IP address in Network B (e.g., 10.0.0.1) using ip route get. The expected outcome would be to see the outgoing interface as eth1, as that's the interface connected to Network B.

However, due to the custom iptables rule that marks packets destined for Network B, the actual routing decision might be influenced by this rule. If the rule alters the packet's marking before it reaches the routing table lookup stage, ip route get might not accurately reflect the actual outgoing interface.

In this case, even though eth1 is the correct outgoing interface based on traditional routing table lookup, the presence of the custom iptables rule could lead to the packet being routed differently, potentially resulting in ip route get incorrectly identifying the outgoing interface as eth0.

Describe how your solution impacts user flows

N/A
Describe the main design/architecture of your solution

Use Go library for packet processing to inspect packets and determine the outgoing interface

Alternative solutions that you considered

N/A
Test plan

N/A
Additional context

#6099 #5832

luolanzone · 2024-04-30T02:17:12Z

I think we always use the primary interface for Egress IP, so not sure if this is a real issue or not. @tnqn should have more insights on this.

rajnkamr · 2024-04-30T04:25:03Z

I think we always use the primary interface for Egress IP, so not sure if this is a real issue or not. @tnqn should have more insights on this.

@luolanzone , we had a discussion related to identifying actual transport interface on node #6099 ,where we wanted to include egress node ip in traceflow while doing packet tracing to destination, currently there is no way to identify whether the interface on node is management interface or actual traffic interface or both.

antoninbas · 2024-04-30T17:21:28Z

I think we always use the primary interface for Egress IP, so not sure if this is a real issue or not. @tnqn should have more insights on this.

We don't know which interface it will be actually, users could configure whichever routing they want for external IPs.

That being said, we have to think carefully about whether we want to implement this feature. I suppose invoking ip route get is simple enough, but anything more complex does not seem worth it to me.

Atish-iaf · 2024-05-09T06:11:31Z

There was a discussion regarding transport interface name for all observations and not only egress observations. Transport interface name can be used whenever packet is leaving the Node.
If we use ip route get, then i think we can provide transport interface name only for egress specific observation on egress Node and not any other observation. It is because -

Inter-Node-Pod-to-Pod
dst IP of Packet on source Node is dst PodIP and using this dst IP with ip route get will provide antrea-gw0. But
actually packet doesn't go through this interface, packet goes through tunnel interface (encap mode).
Remote Egress
dst IP of Packet on source Node is an externalIP and using this dst IP with ip route get will provide any Node
interface(based on routing on that Node) other than tunnel interface. But actually packet goes through tunnel interface of Source Node to Remote/Egress Node (encap mode).

So, the interface name got from ip route get in above cases is not correct.

If we use ip route get for transport interface name, we can do it only for

egress observations on egress Node.
Pod-to-external (without Egress applied on Pod)

antoninbas · 2024-05-09T18:52:01Z

@tnqn originally I thought that there was no guarantee that Egress traffic would leave the Egress Node through the transport interface, but I am not so sure anymore. At least today, this is the only interface through which we advertise the IP, and it seems unlikely that using any other outgoing interface (e.g., if the default route uses a different outgoing interface) would be a valid scenario. Am I missing something? I imagine that this situation may be a bit different with BGP support though.

Edit:

At least today, this is the only interface through which we advertise the IP

Actually this is only for the initial (gratuitous) advertisement, or if arp_ignore > 0.
With arp_ignore == 0 (default), we will reply to ARP requests on any interface. So maybe what I describe above is still a valid scenario, even in L2 mode.

tnqn · 2024-05-13T03:11:28Z

Actually this is only for the initial (gratuitous) advertisement, or if arp_ignore > 0.
With arp_ignore == 0 (default), we will reply to ARP requests on any interface. So maybe what I describe above is still a valid scenario, even in L2 mode.

Right, there are two cases. In most cases where arp_ignore=0, ARP works very well; in the other cases, users need to configure transportInterface to make it work. Besides, if a subnet has VLAN, the outgoing interface will only be the transport interface.

Custom iptables rule: Mark packets destined for IP addresses in the range 10.0.0.0/24 with a specific packet mark.
Now, let's say we want to determine the outgoing interface for a packet destined for an IP address in Network B (e.g., 10.0.0.1) using ip route get. The expected outcome would be to see the outgoing interface as eth1, as that's the interface connected to Network B.

However, due to the custom iptables rule that marks packets destined for Network B, the actual routing decision might be influenced by this rule. If the rule alters the packet's marking before it reaches the routing table lookup stage, ip route get might not accurately reflect the actual outgoing interface.

It doesn't need to be so complicated. In Antrea datapath case, we only use policy routing for VLAN subnet, in which case the transport interface is always the "transportInterface", otherwise the transport interface is determined by the default route tables.

rajnkamr · 2024-05-13T05:40:17Z

If we only use policy routing for vlan subnet, then Identifying the traffic that is destined for the VLAN subnets could be based on the destination IP address range (subnet) associated with the VLAN. After configuring the policy routing rules and specifying the transport interface for VLAN subnet traffic, since these rules take precedence over the default routing rules and rightly transport interface could be configured one only, otherwise default route is good enough, we can make use of same.
Also need to include scenarios, if there are multiple transport interface configured in case of multiple Vlan subnets or otherwise.

antoninbas · 2024-05-13T19:55:21Z

It doesn't need to be so complicated. In Antrea datapath case, we only use policy routing for VLAN subnet, in which case the transport interface is always the "transportInterface", otherwise the transport interface is determined by the default route tables.

I think the concern is that users can use whichever custom routing policies they want on their Nodes for outgoing traffic, even though that's an unlikely scenario.

rajnkamr · 2024-05-14T03:41:31Z

It doesn't need to be so complicated. In Antrea datapath case, we only use policy routing for VLAN subnet, in which case the transport interface is always the "transportInterface", otherwise the transport interface is determined by the default route tables.

I think the concern is that users can use whichever custom routing policies they want on their Nodes for outgoing traffic, even though that's an unlikely scenario.

For the first phase, we can go ahead without custom routing policies support and document it .

github-actions · 2024-08-13T00:04:17Z

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment, or this will be closed in 90 days

marcozov · 2024-09-02T10:51:30Z

Hi everyone,

I think I have a very similar problem.
I have two subnets, one for the IPs of the nodes (let's call it network A, 10.111.27.0/24) and one for the egress IPs (let's call it network B, 10.15.0.0/16).
The interfaces look like this:

ip a
...
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 00:50:56:b7:c6:34 brd ff:ff:ff:ff:ff:ff
    altname enp11s0
    altname ens192
    inet 10.111.27.101/24 brd 10.111.27.255 scope global eth0
       valid_lft forever preferred_lft forever
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
...
    inet 10.15.0.2/16 brd 10.15.255.255 scope global eth1
       valid_lft forever preferred_lft forever

So:

eth0, vlan 111, network A (10.111.27.0/24)
eth1, vlan 15, network B (10.15.0.0/16)

Therefore I have the following configuration:

---
apiVersion: crd.antrea.io/v1beta1
kind: ExternalIPPool
metadata:
  name: app-test-1-external-ip-pool
spec:
  ipRanges:
  - start: 10.15.0.20
    end: 10.15.0.20
  subnetInfo:
    gateway: 10.15.254.254
    prefixLength: 16
    vlan: 15
  nodeSelector:
    matchLabels:
      node-role.kubernetes.io/worker: "true"

---
apiVersion: crd.antrea.io/v1beta1
kind: Egress
metadata:
  name: app-test-1-egress
spec:
  appliedTo:
    namespaceSelector:
      matchLabels:
        kubernetes.io/metadata.name: app-test-1
  externalIPPool: app-test-1-external-ip-pool

Basically, I'd like to assign the IP 10.15.0.20 (as source IP) to all the traffic that flows from all the pods within the namespace app-test-1 namespace to any resource out of the kubernetes cluster (although the nodes are in the 10.111.27.0/24 network).
This is what I configured with the ExternalIPPool resource (with the subnetInfo field).
However, the interface is created like this:

ip a
20: antrea-ext.15@eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
    link/ether 00:50:56:b7:58:9b brd ff:ff:ff:ff:ff:ff
    inet 10.15.0.20/16 brd 10.15.255.255 scope global antrea-ext.15
       valid_lft forever preferred_lft forever

and it seems like it's still using the eth0 interface for egress traffic, although the ip rules and the route tables were changed:

ip rule
0:      from all lookup local
32765:  from all fwmark 0x1/0xff lookup 101
32766:  from all lookup main
32767:  from all lookup default
...
ip route show table all
default via 10.15.254.254 dev antrea-ext.15 table 101
10.15.0.0/16 dev antrea-ext.15 table 101 scope link
default via 10.111.20.254 dev eth0
10.111.27.0/24 dev eth0 proto kernel scope link src 10.111.27.101
10.15.0.0/16 dev eth1 proto kernel scope link src 10.15.0.2
10.15.0.0/16 dev antrea-ext.15 proto kernel scope link src 10.15.0.20
172.16.0.0/24 dev antrea-gw0 scope link src 172.16.0.1
...

but I think the virtual interface antrea-ext.15 is still using eth0, while it should use eth1 (otherwise it won't even find it's default gateway, which is 10.15.254.254 and not 10.111.27.254).

I'm wondering whether this is the intended usage of this functionality (egress + subnetInfo parameter) or if I'm somehow trying to abuse this feature.
Clearly, when I put everything under the same subnet (nodes IPs and egress IPs) everything works like a charm.

antoninbas · 2024-09-03T17:51:50Z

@marcozov that discussion is much more relevant to your issue: #6547

the short answer is that at the moment if you want to use subnetInfo and a VLAN for Egress, you cannot use a separate interface. The current code will only create VLAN subinterfaces for the transport interface, which is defined as the interface used to forward inter-Node Pod traffic.

There has been quite a bit of confusion around correct usage (and limitations) of subnetInfo, and @tnqn mentioned earlier that he would think about how to improve things.

rajnkamr added the kind/design Categorizes issue or PR as related to design. label Apr 29, 2024

antoninbas mentioned this issue May 9, 2024

To support Egress based path in Traceflow Output #6099

Closed

rajnkamr assigned Atish-iaf May 14, 2024

rajnkamr added the action/release-note Indicates a PR that should be included in release notes. label May 14, 2024

github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 13, 2024

github-actions bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

K8 Node Transport Interface Detection #6273

K8 Node Transport Interface Detection #6273

rajnkamr commented Apr 29, 2024 •

edited

Loading

luolanzone commented Apr 30, 2024 •

edited

Loading

rajnkamr commented Apr 30, 2024

antoninbas commented Apr 30, 2024

Atish-iaf commented May 9, 2024 •

edited

Loading

antoninbas commented May 9, 2024 •

edited

Loading

tnqn commented May 13, 2024

rajnkamr commented May 13, 2024

antoninbas commented May 13, 2024

rajnkamr commented May 14, 2024

github-actions bot commented Aug 13, 2024

marcozov commented Sep 2, 2024

antoninbas commented Sep 3, 2024

K8 Node Transport Interface Detection #6273

K8 Node Transport Interface Detection #6273

Comments

rajnkamr commented Apr 29, 2024 • edited Loading

luolanzone commented Apr 30, 2024 • edited Loading

rajnkamr commented Apr 30, 2024

antoninbas commented Apr 30, 2024

Atish-iaf commented May 9, 2024 • edited Loading

antoninbas commented May 9, 2024 • edited Loading

tnqn commented May 13, 2024

rajnkamr commented May 13, 2024

antoninbas commented May 13, 2024

rajnkamr commented May 14, 2024

github-actions bot commented Aug 13, 2024

marcozov commented Sep 2, 2024

antoninbas commented Sep 3, 2024

rajnkamr commented Apr 29, 2024 •

edited

Loading

luolanzone commented Apr 30, 2024 •

edited

Loading

Atish-iaf commented May 9, 2024 •

edited

Loading

antoninbas commented May 9, 2024 •

edited

Loading