TCP rejection can't work on Kind when the traffic mode is noEncap #2025

GraysonWu · 2021-04-02T02:29:06Z

Describe the problem/challenge you have

When exec E2E test, the test case TestAntreaPolicy/TestGroupNoK8sNP/Case=ACNPRejectIngress which is testing if Reject can work on TCP traffic always fails when enabling AntreaPolicy in noEncap mode. Some connections that should be rejected are observed as dropped. But it doesn't fail when testing it on the local vagrant testbed.

When manually test Reject on TCP traffic in a local Kind cluster in noEncap mode with AntreaPolicy enabling:

For the intra-node case, everything is good.
For the inter-node case, the requesting client will keep retrying and waiting until timed out with no reject response received.

According to the investigation, on Kind cluster, we use the OVS netdev datapath, which requires a bridge setup on each node. For the inter-node traffic case in noEncap mode, there will be two conntrack lookups for the same connection. When a TCP RST packet is sent out by the Reject action, the conntrack entry will be destroyed by the first lookup. Then the second lookup will tag the packet as INVALID and drop it. Since the response packet has been dropped by the Kernal, the Pod won't receive it. Thus the Pod will keep retrying and waiting until timed out with no reject response received.

So skip this test for now when the provider is Kind the traffic mode is noEncap.

Thanks for @antoninbas 's help during the whole process.

Reference: https://ask.openstack.org/en/question/28300/iptables-invalid-rule-preventing-rst-packets-on-closed-ports-between-vms/

The text was updated successfully, but these errors were encountered:

antoninbas · 2021-04-02T20:40:51Z

To be clear, it's a similar situation as https://ask.openstack.org/en/question/28300/iptables-invalid-rule-preventing-rst-packets-on-closed-ports-between-vms/, but not quite the same.

The TCP RST packet does go twice through conntrack for some reason, which seems related to the fact that we use an extra netdev bridge (br-phy) attached to a physical interface (eth0). I have confirmed this by adding the following iptables rule: iptables -t raw -A PREROUTING -p tcp --tcp-flags RST RST -j ACCEPT

# check counters
Chain PREROUTING (policy ACCEPT 87 packets, 35539 bytes)
 pkts bytes target     prot opt in     out     source               destination
 403K  367M ANTREA-PREROUTING  all  --  any    any     anywhere             anywhere             /* Antrea: jump to Antrea prerouting rules */
    0     0 ACCEPT     tcp  --  any    any     anywhere             anywhere             tcp flags:RST/RST
# try a TCP connection to a Pod on another Node, which is rejected
# check counters again
Chain PREROUTING (policy ACCEPT 232 packets, 79185 bytes)
 pkts bytes target     prot opt in     out     source               destination
 403K  367M ANTREA-PREROUTING  all  --  any    any     anywhere             anywhere             /* Antrea: jump to Antrea prerouting rules */
    2    80 ACCEPT     tcp  --  any    any     anywhere             anywhere             tcp flags:RST/RST

Notice how that rule was hit twice despite the fact that there was a single TCP RST packet. So that packet must be going twice through PREROUTING and conntrack. The conntrack entry is destroyed the first time, which causes the packet to be dropped the second time because it is now invalid.

I read something related in the OVS documentation:

Firewall Rules
On Linux, when a physical interface is in use by the userspace datapath, packets received on the interface still also pass into the kernel TCP/IP stack. This can cause surprising and incorrect behavior. You can use "iptables" to avoid this behavior, by using it to drop received packets. For example, to drop packets received on eth0:

iptables -A INPUT -i eth0 -j DROP
iptables -A FORWARD -i eth0 -j DROP

These rules don't help with our situation (the packet still goes through conntrack twice). However the following rule did help:

iptables -t raw -A PREROUTING -i eth0 -j DROP

After that the TCP reset can make its way back to the source Pod as expected.

@GraysonWu It may be interesting to add this rule as part of https://github.com/vmware-tanzu/antrea/blob/main/build/images/scripts/start_ovs_netdev. Then we should be able to run the test instead of skipping it. It may help with other networking issues in Kind, who knows... We can look into this after #2001 is merged, there is no rush.

GraysonWu · 2021-04-02T20:46:15Z

Thanks @antoninbas for adding these helpful details. Yeah, we could try that later.

According to the OVS documentation: On Linux, when a physical interface is in use by the userspace datapath, packets received on the interface still also pass into the kernel TCP/IP stack. This can cause surprising and incorrect behavior. You can use "iptables" to avoid this behavior, by using it to drop received packets. The OVS documentation suggests dropping packets in the INPUT and FORWARD chains. However, this is not sufficient for some edge cases. For example, when receiving a TCP RST packet, the packet will clear the conntrack entry for the TCP connection before it can be dropped, which can cause the "second" TCP RST packet (the one processed by OVS userspace) to be marked as invalid when going through conntrack. So instead we drop the packet in PREROUTING: iptables -t raw -A PREROUTING -i eth0 -j DROP This rule is added to the start_ovs_netdev script. By adding this rule, we no longer need to skip TCP e2e tests for the Reject NetworkPolicy Action in Kind clusters. It's possible that this is also going to help with various connectivity issues we observed with Antrea in Kind over time. For example, I believe we may also be able to remove the hack which reduces the value of the tcp_retries2 sysctl parameter. I need to run tests to confirm. Fixes antrea-io#2025

According to the OVS documentation: On Linux, when a physical interface is in use by the userspace datapath, packets received on the interface still also pass into the kernel TCP/IP stack. This can cause surprising and incorrect behavior. You can use "iptables" to avoid this behavior, by using it to drop received packets. The OVS documentation suggests dropping packets in the INPUT and FORWARD chains. However, this is not sufficient for some edge cases. For example, when receiving a TCP RST packet, the packet will clear the conntrack entry for the TCP connection before it can be dropped, which can cause the "second" TCP RST packet (the one processed by OVS userspace) to be marked as invalid when going through conntrack. So instead we drop the packet in PREROUTING: iptables -t raw -A PREROUTING -i eth0 -j DROP This rule is added to the start_ovs_netdev script. By adding this rule, we no longer need to skip TCP e2e tests for the Reject NetworkPolicy Action in Kind clusters. It's possible that this is also going to help with various connectivity issues we observed with Antrea in Kind over time. For example, I believe we may also be able to remove the hack which reduces the value of the tcp_retries2 sysctl parameter. I need to run tests to confirm. Fixes antrea-io#2025 Signed-off-by: Antonin Bas <abas@vmware.com>

According to the OVS documentation: On Linux, when a physical interface is in use by the userspace datapath, packets received on the interface still also pass into the kernel TCP/IP stack. This can cause surprising and incorrect behavior. You can use "iptables" to avoid this behavior, by using it to drop received packets. The OVS documentation suggests dropping packets in the INPUT and FORWARD chains. However, this is not sufficient for some edge cases. For example, when receiving a TCP RST packet, the packet will clear the conntrack entry for the TCP connection before it can be dropped, which can cause the "second" TCP RST packet (the one processed by OVS userspace) to be marked as invalid when going through conntrack. So instead we drop the packet in PREROUTING: iptables -t raw -A PREROUTING -i eth0 -j DROP This rule is added to the start_ovs_netdev script. By adding this rule, we no longer need to skip TCP e2e tests for the Reject NetworkPolicy Action in Kind clusters. It's possible that this is also going to help with various connectivity issues we observed with Antrea in Kind over time. For example, I believe we are also able to remove the hack which reduces the value of the tcp_retries2 sysctl parameter. Fixes #2025 Signed-off-by: Antonin Bas <abas@vmware.com>

GraysonWu added the kind/feature Categorizes issue or PR as related to a new feature. label Apr 2, 2021

antoninbas added kind/bug Categorizes issue or PR as related to a bug. and removed kind/feature Categorizes issue or PR as related to a new feature. labels Apr 2, 2021

jianjuns mentioned this issue Apr 5, 2021

Support tracing live traffic in Traceflow #2030

Closed

antoninbas mentioned this issue Apr 30, 2021

Drop eth0 packets in PREROUTING on Kind Nodes #2143

Merged

antoninbas closed this as completed in #2143 Jun 16, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TCP rejection can't work on Kind when the traffic mode is noEncap #2025

TCP rejection can't work on Kind when the traffic mode is noEncap #2025

GraysonWu commented Apr 2, 2021

antoninbas commented Apr 2, 2021

GraysonWu commented Apr 2, 2021

TCP rejection can't work on Kind when the traffic mode is noEncap #2025

TCP rejection can't work on Kind when the traffic mode is noEncap #2025

Comments

GraysonWu commented Apr 2, 2021

antoninbas commented Apr 2, 2021

GraysonWu commented Apr 2, 2021