Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TCP rejection can't work on Kind when the traffic mode is noEncap #2025

Closed
GraysonWu opened this issue Apr 2, 2021 · 2 comments · Fixed by #2143
Closed

TCP rejection can't work on Kind when the traffic mode is noEncap #2025

GraysonWu opened this issue Apr 2, 2021 · 2 comments · Fixed by #2143
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@GraysonWu
Copy link
Contributor

Describe the problem/challenge you have

When exec E2E test, the test case TestAntreaPolicy/TestGroupNoK8sNP/Case=ACNPRejectIngress which is testing if Reject can work on TCP traffic always fails when enabling AntreaPolicy in noEncap mode. Some connections that should be rejected are observed as dropped. But it doesn't fail when testing it on the local vagrant testbed.

When manually test Reject on TCP traffic in a local Kind cluster in noEncap mode with AntreaPolicy enabling:

  • For the intra-node case, everything is good.
  • For the inter-node case, the requesting client will keep retrying and waiting until timed out with no reject response received.

According to the investigation, on Kind cluster, we use the OVS netdev datapath, which requires a bridge setup on each node. For the inter-node traffic case in noEncap mode, there will be two conntrack lookups for the same connection. When a TCP RST packet is sent out by the Reject action, the conntrack entry will be destroyed by the first lookup. Then the second lookup will tag the packet as INVALID and drop it. Since the response packet has been dropped by the Kernal, the Pod won't receive it. Thus the Pod will keep retrying and waiting until timed out with no reject response received.

So skip this test for now when the provider is Kind the traffic mode is noEncap.

Thanks for @antoninbas 's help during the whole process.

Reference: https://ask.openstack.org/en/question/28300/iptables-invalid-rule-preventing-rst-packets-on-closed-ports-between-vms/

@GraysonWu GraysonWu added the kind/feature Categorizes issue or PR as related to a new feature. label Apr 2, 2021
@antoninbas antoninbas added kind/bug Categorizes issue or PR as related to a bug. and removed kind/feature Categorizes issue or PR as related to a new feature. labels Apr 2, 2021
@antoninbas
Copy link
Contributor

To be clear, it's a similar situation as https://ask.openstack.org/en/question/28300/iptables-invalid-rule-preventing-rst-packets-on-closed-ports-between-vms/, but not quite the same.

The TCP RST packet does go twice through conntrack for some reason, which seems related to the fact that we use an extra netdev bridge (br-phy) attached to a physical interface (eth0). I have confirmed this by adding the following iptables rule: iptables -t raw -A PREROUTING -p tcp --tcp-flags RST RST -j ACCEPT

# check counters
Chain PREROUTING (policy ACCEPT 87 packets, 35539 bytes)
 pkts bytes target     prot opt in     out     source               destination
 403K  367M ANTREA-PREROUTING  all  --  any    any     anywhere             anywhere             /* Antrea: jump to Antrea prerouting rules */
    0     0 ACCEPT     tcp  --  any    any     anywhere             anywhere             tcp flags:RST/RST
# try a TCP connection to a Pod on another Node, which is rejected
# check counters again
Chain PREROUTING (policy ACCEPT 232 packets, 79185 bytes)
 pkts bytes target     prot opt in     out     source               destination
 403K  367M ANTREA-PREROUTING  all  --  any    any     anywhere             anywhere             /* Antrea: jump to Antrea prerouting rules */
    2    80 ACCEPT     tcp  --  any    any     anywhere             anywhere             tcp flags:RST/RST

Notice how that rule was hit twice despite the fact that there was a single TCP RST packet. So that packet must be going twice through PREROUTING and conntrack. The conntrack entry is destroyed the first time, which causes the packet to be dropped the second time because it is now invalid.

I read something related in the OVS documentation:

Firewall Rules
On Linux, when a physical interface is in use by the userspace datapath, packets received on the interface still also pass into the kernel TCP/IP stack. This can cause surprising and incorrect behavior. You can use "iptables" to avoid this behavior, by using it to drop received packets. For example, to drop packets received on eth0:

iptables -A INPUT -i eth0 -j DROP
iptables -A FORWARD -i eth0 -j DROP

These rules don't help with our situation (the packet still goes through conntrack twice). However the following rule did help:

iptables -t raw -A PREROUTING -i eth0 -j DROP

After that the TCP reset can make its way back to the source Pod as expected.

@GraysonWu It may be interesting to add this rule as part of https://github.com/vmware-tanzu/antrea/blob/main/build/images/scripts/start_ovs_netdev. Then we should be able to run the test instead of skipping it. It may help with other networking issues in Kind, who knows... We can look into this after #2001 is merged, there is no rush.

@GraysonWu
Copy link
Contributor Author

Thanks @antoninbas for adding these helpful details. Yeah, we could try that later.

antoninbas added a commit to antoninbas/antrea that referenced this issue Apr 30, 2021
According to the OVS documentation:
On Linux, when a physical interface is in use by the userspace datapath,
packets received on the interface still also pass into the kernel TCP/IP
stack. This can cause surprising and incorrect behavior. You can use
"iptables" to avoid this behavior, by using it to drop received packets.

The OVS documentation suggests dropping packets in the INPUT and FORWARD
chains. However, this is not sufficient for some edge cases. For
example, when receiving a TCP RST packet, the packet will clear the
conntrack entry for the TCP connection before it can be dropped, which
can cause the "second" TCP RST packet (the one processed by OVS
userspace) to be marked as invalid when going through conntrack.

So instead we drop the packet in PREROUTING:
iptables -t raw -A PREROUTING -i eth0 -j DROP
This rule is added to the start_ovs_netdev script.

By adding this rule, we no longer need to skip TCP e2e tests for the
Reject NetworkPolicy Action in Kind clusters.

It's possible that this is also going to help with various connectivity
issues we observed with Antrea in Kind over time. For example, I believe
we may also be able to remove the hack which reduces the value of the
tcp_retries2 sysctl parameter. I need to run tests to confirm.

Fixes antrea-io#2025
antoninbas added a commit to antoninbas/antrea that referenced this issue Apr 30, 2021
According to the OVS documentation:
On Linux, when a physical interface is in use by the userspace datapath,
packets received on the interface still also pass into the kernel TCP/IP
stack. This can cause surprising and incorrect behavior. You can use
"iptables" to avoid this behavior, by using it to drop received packets.

The OVS documentation suggests dropping packets in the INPUT and FORWARD
chains. However, this is not sufficient for some edge cases. For
example, when receiving a TCP RST packet, the packet will clear the
conntrack entry for the TCP connection before it can be dropped, which
can cause the "second" TCP RST packet (the one processed by OVS
userspace) to be marked as invalid when going through conntrack.

So instead we drop the packet in PREROUTING:
iptables -t raw -A PREROUTING -i eth0 -j DROP
This rule is added to the start_ovs_netdev script.

By adding this rule, we no longer need to skip TCP e2e tests for the
Reject NetworkPolicy Action in Kind clusters.

It's possible that this is also going to help with various connectivity
issues we observed with Antrea in Kind over time. For example, I believe
we may also be able to remove the hack which reduces the value of the
tcp_retries2 sysctl parameter. I need to run tests to confirm.

Fixes antrea-io#2025
antoninbas added a commit to antoninbas/antrea that referenced this issue May 21, 2021
According to the OVS documentation:
On Linux, when a physical interface is in use by the userspace datapath,
packets received on the interface still also pass into the kernel TCP/IP
stack. This can cause surprising and incorrect behavior. You can use
"iptables" to avoid this behavior, by using it to drop received packets.

The OVS documentation suggests dropping packets in the INPUT and FORWARD
chains. However, this is not sufficient for some edge cases. For
example, when receiving a TCP RST packet, the packet will clear the
conntrack entry for the TCP connection before it can be dropped, which
can cause the "second" TCP RST packet (the one processed by OVS
userspace) to be marked as invalid when going through conntrack.

So instead we drop the packet in PREROUTING:
iptables -t raw -A PREROUTING -i eth0 -j DROP
This rule is added to the start_ovs_netdev script.

By adding this rule, we no longer need to skip TCP e2e tests for the
Reject NetworkPolicy Action in Kind clusters.

It's possible that this is also going to help with various connectivity
issues we observed with Antrea in Kind over time. For example, I believe
we may also be able to remove the hack which reduces the value of the
tcp_retries2 sysctl parameter. I need to run tests to confirm.

Fixes antrea-io#2025

Signed-off-by: Antonin Bas <abas@vmware.com>
antoninbas added a commit to antoninbas/antrea that referenced this issue May 28, 2021
According to the OVS documentation:
On Linux, when a physical interface is in use by the userspace datapath,
packets received on the interface still also pass into the kernel TCP/IP
stack. This can cause surprising and incorrect behavior. You can use
"iptables" to avoid this behavior, by using it to drop received packets.

The OVS documentation suggests dropping packets in the INPUT and FORWARD
chains. However, this is not sufficient for some edge cases. For
example, when receiving a TCP RST packet, the packet will clear the
conntrack entry for the TCP connection before it can be dropped, which
can cause the "second" TCP RST packet (the one processed by OVS
userspace) to be marked as invalid when going through conntrack.

So instead we drop the packet in PREROUTING:
iptables -t raw -A PREROUTING -i eth0 -j DROP
This rule is added to the start_ovs_netdev script.

By adding this rule, we no longer need to skip TCP e2e tests for the
Reject NetworkPolicy Action in Kind clusters.

It's possible that this is also going to help with various connectivity
issues we observed with Antrea in Kind over time. For example, I believe
we may also be able to remove the hack which reduces the value of the
tcp_retries2 sysctl parameter. I need to run tests to confirm.

Fixes antrea-io#2025

Signed-off-by: Antonin Bas <abas@vmware.com>
antoninbas added a commit to antoninbas/antrea that referenced this issue Jun 4, 2021
According to the OVS documentation:
On Linux, when a physical interface is in use by the userspace datapath,
packets received on the interface still also pass into the kernel TCP/IP
stack. This can cause surprising and incorrect behavior. You can use
"iptables" to avoid this behavior, by using it to drop received packets.

The OVS documentation suggests dropping packets in the INPUT and FORWARD
chains. However, this is not sufficient for some edge cases. For
example, when receiving a TCP RST packet, the packet will clear the
conntrack entry for the TCP connection before it can be dropped, which
can cause the "second" TCP RST packet (the one processed by OVS
userspace) to be marked as invalid when going through conntrack.

So instead we drop the packet in PREROUTING:
iptables -t raw -A PREROUTING -i eth0 -j DROP
This rule is added to the start_ovs_netdev script.

By adding this rule, we no longer need to skip TCP e2e tests for the
Reject NetworkPolicy Action in Kind clusters.

It's possible that this is also going to help with various connectivity
issues we observed with Antrea in Kind over time. For example, I believe
we may also be able to remove the hack which reduces the value of the
tcp_retries2 sysctl parameter. I need to run tests to confirm.

Fixes antrea-io#2025

Signed-off-by: Antonin Bas <abas@vmware.com>
antoninbas added a commit that referenced this issue Jun 16, 2021
According to the OVS documentation:
On Linux, when a physical interface is in use by the userspace datapath,
packets received on the interface still also pass into the kernel TCP/IP
stack. This can cause surprising and incorrect behavior. You can use
"iptables" to avoid this behavior, by using it to drop received packets.

The OVS documentation suggests dropping packets in the INPUT and FORWARD
chains. However, this is not sufficient for some edge cases. For
example, when receiving a TCP RST packet, the packet will clear the
conntrack entry for the TCP connection before it can be dropped, which
can cause the "second" TCP RST packet (the one processed by OVS
userspace) to be marked as invalid when going through conntrack.

So instead we drop the packet in PREROUTING:
iptables -t raw -A PREROUTING -i eth0 -j DROP
This rule is added to the start_ovs_netdev script.

By adding this rule, we no longer need to skip TCP e2e tests for the
Reject NetworkPolicy Action in Kind clusters.

It's possible that this is also going to help with various connectivity
issues we observed with Antrea in Kind over time. For example, I believe
we are also able to remove the hack which reduces the value of the
tcp_retries2 sysctl parameter.

Fixes #2025

Signed-off-by: Antonin Bas <abas@vmware.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants