Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

K8 Node Transport Interface Detection #6273

Open
rajnkamr opened this issue Apr 29, 2024 · 12 comments
Open

K8 Node Transport Interface Detection #6273

rajnkamr opened this issue Apr 29, 2024 · 12 comments
Assignees
Labels
action/release-note Indicates a PR that should be included in release notes. kind/design Categorizes issue or PR as related to design.

Comments

@rajnkamr
Copy link
Contributor

rajnkamr commented Apr 29, 2024

Describe what you are trying to solve

A node can have multiple interfaces, packets can be routed differently due to iptable rules. However, It is important to find the actual transport interface for Egress traffic on the node.
Describe the solution you have in mind

Node has two network interfaces: eth0 connected to Network A and eth1 connected to Network B. Additionally, there's a custom iptables rule that marks packets with a specific packet mark when they are destined for a particular IP address range.

Here's a simplified representation of the scenario:

Network A: 192.168.1.0/24
Network B: 10.0.0.0/24
Custom iptables rule: Mark packets destined for IP addresses in the range 10.0.0.0/24 with a specific packet mark.
Now, let's say we want to determine the outgoing interface for a packet destined for an IP address in Network B (e.g., 10.0.0.1) using ip route get. The expected outcome would be to see the outgoing interface as eth1, as that's the interface connected to Network B.

However, due to the custom iptables rule that marks packets destined for Network B, the actual routing decision might be influenced by this rule. If the rule alters the packet's marking before it reaches the routing table lookup stage, ip route get might not accurately reflect the actual outgoing interface.

In this case, even though eth1 is the correct outgoing interface based on traditional routing table lookup, the presence of the custom iptables rule could lead to the packet being routed differently, potentially resulting in ip route get incorrectly identifying the outgoing interface as eth0.

Describe how your solution impacts user flows

N/A
Describe the main design/architecture of your solution

Use Go library for packet processing to inspect packets and determine the outgoing interface

Alternative solutions that you considered

N/A
Test plan

N/A
Additional context

#6099 #5832

@rajnkamr rajnkamr added the kind/design Categorizes issue or PR as related to design. label Apr 29, 2024
@luolanzone
Copy link
Contributor

luolanzone commented Apr 30, 2024

I think we always use the primary interface for Egress IP, so not sure if this is a real issue or not. @tnqn should have more insights on this.

@rajnkamr
Copy link
Contributor Author

I think we always use the primary interface for Egress IP, so not sure if this is a real issue or not. @tnqn should have more insights on this.

@luolanzone , we had a discussion related to identifying actual transport interface on node #6099 ,where we wanted to include egress node ip in traceflow while doing packet tracing to destination, currently there is no way to identify whether the interface on node is management interface or actual traffic interface or both.

@antoninbas
Copy link
Contributor

I think we always use the primary interface for Egress IP, so not sure if this is a real issue or not. @tnqn should have more insights on this.

We don't know which interface it will be actually, users could configure whichever routing they want for external IPs.

That being said, we have to think carefully about whether we want to implement this feature. I suppose invoking ip route get is simple enough, but anything more complex does not seem worth it to me.

@Atish-iaf
Copy link
Contributor

Atish-iaf commented May 9, 2024

There was a discussion regarding transport interface name for all observations and not only egress observations. Transport interface name can be used whenever packet is leaving the Node.
If we use ip route get, then i think we can provide transport interface name only for egress specific observation on egress Node and not any other observation. It is because -

  1. Inter-Node-Pod-to-Pod
    dst IP of Packet on source Node is dst PodIP and using this dst IP with ip route get will provide antrea-gw0. But
    actually packet doesn't go through this interface, packet goes through tunnel interface (encap mode).

  2. Remote Egress
    dst IP of Packet on source Node is an externalIP and using this dst IP with ip route get will provide any Node
    interface(based on routing on that Node) other than tunnel interface. But actually packet goes through tunnel interface of Source Node to Remote/Egress Node (encap mode).

So, the interface name got from ip route get in above cases is not correct.

If we use ip route get for transport interface name, we can do it only for

  • egress observations on egress Node.
  • Pod-to-external (without Egress applied on Pod)

@antoninbas
Copy link
Contributor

antoninbas commented May 9, 2024

@tnqn originally I thought that there was no guarantee that Egress traffic would leave the Egress Node through the transport interface, but I am not so sure anymore. At least today, this is the only interface through which we advertise the IP, and it seems unlikely that using any other outgoing interface (e.g., if the default route uses a different outgoing interface) would be a valid scenario. Am I missing something? I imagine that this situation may be a bit different with BGP support though.

Edit:

At least today, this is the only interface through which we advertise the IP

Actually this is only for the initial (gratuitous) advertisement, or if arp_ignore > 0.
With arp_ignore == 0 (default), we will reply to ARP requests on any interface. So maybe what I describe above is still a valid scenario, even in L2 mode.

@tnqn
Copy link
Member

tnqn commented May 13, 2024

Actually this is only for the initial (gratuitous) advertisement, or if arp_ignore > 0.
With arp_ignore == 0 (default), we will reply to ARP requests on any interface. So maybe what I describe above is still a valid scenario, even in L2 mode.

Right, there are two cases. In most cases where arp_ignore=0, ARP works very well; in the other cases, users need to configure transportInterface to make it work. Besides, if a subnet has VLAN, the outgoing interface will only be the transport interface.

Custom iptables rule: Mark packets destined for IP addresses in the range 10.0.0.0/24 with a specific packet mark.
Now, let's say we want to determine the outgoing interface for a packet destined for an IP address in Network B (e.g., 10.0.0.1) using ip route get. The expected outcome would be to see the outgoing interface as eth1, as that's the interface connected to Network B.

However, due to the custom iptables rule that marks packets destined for Network B, the actual routing decision might be influenced by this rule. If the rule alters the packet's marking before it reaches the routing table lookup stage, ip route get might not accurately reflect the actual outgoing interface.

It doesn't need to be so complicated. In Antrea datapath case, we only use policy routing for VLAN subnet, in which case the transport interface is always the "transportInterface", otherwise the transport interface is determined by the default route tables.

@rajnkamr
Copy link
Contributor Author

If we only use policy routing for vlan subnet, then Identifying the traffic that is destined for the VLAN subnets could be based on the destination IP address range (subnet) associated with the VLAN. After configuring the policy routing rules and specifying the transport interface for VLAN subnet traffic, since these rules take precedence over the default routing rules and rightly transport interface could be configured one only, otherwise default route is good enough, we can make use of same.
Also need to include scenarios, if there are multiple transport interface configured in case of multiple Vlan subnets or otherwise.

@antoninbas
Copy link
Contributor

It doesn't need to be so complicated. In Antrea datapath case, we only use policy routing for VLAN subnet, in which case the transport interface is always the "transportInterface", otherwise the transport interface is determined by the default route tables.

I think the concern is that users can use whichever custom routing policies they want on their Nodes for outgoing traffic, even though that's an unlikely scenario.

@rajnkamr
Copy link
Contributor Author

It doesn't need to be so complicated. In Antrea datapath case, we only use policy routing for VLAN subnet, in which case the transport interface is always the "transportInterface", otherwise the transport interface is determined by the default route tables.

I think the concern is that users can use whichever custom routing policies they want on their Nodes for outgoing traffic, even though that's an unlikely scenario.

For the first phase, we can go ahead without custom routing policies support and document it .

@rajnkamr rajnkamr added the action/release-note Indicates a PR that should be included in release notes. label May 14, 2024
Copy link
Contributor

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment, or this will be closed in 90 days

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 13, 2024
@marcozov
Copy link

marcozov commented Sep 2, 2024

Hi everyone,

I think I have a very similar problem.
I have two subnets, one for the IPs of the nodes (let's call it network A, 10.111.27.0/24) and one for the egress IPs (let's call it network B, 10.15.0.0/16).
The interfaces look like this:

ip a
...
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 00:50:56:b7:c6:34 brd ff:ff:ff:ff:ff:ff
    altname enp11s0
    altname ens192
    inet 10.111.27.101/24 brd 10.111.27.255 scope global eth0
       valid_lft forever preferred_lft forever
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
...
    inet 10.15.0.2/16 brd 10.15.255.255 scope global eth1
       valid_lft forever preferred_lft forever

So:

  • eth0, vlan 111, network A (10.111.27.0/24)
  • eth1, vlan 15, network B (10.15.0.0/16)

Therefore I have the following configuration:

---
apiVersion: crd.antrea.io/v1beta1
kind: ExternalIPPool
metadata:
  name: app-test-1-external-ip-pool
spec:
  ipRanges:
  - start: 10.15.0.20
    end: 10.15.0.20
  subnetInfo:
    gateway: 10.15.254.254
    prefixLength: 16
    vlan: 15
  nodeSelector:
    matchLabels:
      node-role.kubernetes.io/worker: "true"

---
apiVersion: crd.antrea.io/v1beta1
kind: Egress
metadata:
  name: app-test-1-egress
spec:
  appliedTo:
    namespaceSelector:
      matchLabels:
        kubernetes.io/metadata.name: app-test-1
  externalIPPool: app-test-1-external-ip-pool

Basically, I'd like to assign the IP 10.15.0.20 (as source IP) to all the traffic that flows from all the pods within the namespace app-test-1 namespace to any resource out of the kubernetes cluster (although the nodes are in the 10.111.27.0/24 network).
This is what I configured with the ExternalIPPool resource (with the subnetInfo field).
However, the interface is created like this:

ip a
20: antrea-ext.15@eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
    link/ether 00:50:56:b7:58:9b brd ff:ff:ff:ff:ff:ff
    inet 10.15.0.20/16 brd 10.15.255.255 scope global antrea-ext.15
       valid_lft forever preferred_lft forever

and it seems like it's still using the eth0 interface for egress traffic, although the ip rules and the route tables were changed:

ip rule
0:      from all lookup local
32765:  from all fwmark 0x1/0xff lookup 101
32766:  from all lookup main
32767:  from all lookup default
...
ip route show table all
default via 10.15.254.254 dev antrea-ext.15 table 101
10.15.0.0/16 dev antrea-ext.15 table 101 scope link
default via 10.111.20.254 dev eth0
10.111.27.0/24 dev eth0 proto kernel scope link src 10.111.27.101
10.15.0.0/16 dev eth1 proto kernel scope link src 10.15.0.2
10.15.0.0/16 dev antrea-ext.15 proto kernel scope link src 10.15.0.20
172.16.0.0/24 dev antrea-gw0 scope link src 172.16.0.1
...

but I think the virtual interface antrea-ext.15 is still using eth0, while it should use eth1 (otherwise it won't even find it's default gateway, which is 10.15.254.254 and not 10.111.27.254).

I'm wondering whether this is the intended usage of this functionality (egress + subnetInfo parameter) or if I'm somehow trying to abuse this feature.
Clearly, when I put everything under the same subnet (nodes IPs and egress IPs) everything works like a charm.

@github-actions github-actions bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 3, 2024
@antoninbas
Copy link
Contributor

@marcozov that discussion is much more relevant to your issue: #6547

the short answer is that at the moment if you want to use subnetInfo and a VLAN for Egress, you cannot use a separate interface. The current code will only create VLAN subinterfaces for the transport interface, which is defined as the interface used to forward inter-Node Pod traffic.

There has been quite a bit of confusion around correct usage (and limitations) of subnetInfo, and @tnqn mentioned earlier that he would think about how to improve things.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
action/release-note Indicates a PR that should be included in release notes. kind/design Categorizes issue or PR as related to design.
Projects
None yet
Development

No branches or pull requests

6 participants