Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Antrea L7 NetworkPolicies broken with latest Suricata version (v6.0.11) #4921

Closed
antoninbas opened this issue Apr 29, 2023 · 11 comments · Fixed by #4968
Closed

Antrea L7 NetworkPolicies broken with latest Suricata version (v6.0.11) #4921

antoninbas opened this issue Apr 29, 2023 · 11 comments · Fixed by #4968
Assignees
Labels
area/network-policy Issues or PRs related to network policies. kind/bug Categorizes issue or PR as related to a bug. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now.

Comments

@antoninbas
Copy link
Contributor

Describe the bug

This is a screenshot of the latest runs for the Kind Github workflow, on the main branch:

image

(source: https://github.com/antrea-io/antrea/actions/workflows/kind.yml?query=branch%3Amain)

The last 2 workflows have failed for the same reason:

2023-04-28T13:44:37.1046664Z === RUN   TestL7NetworkPolicy/HTTP
2023-04-28T13:44:43.1289764Z === RUN   TestL7NetworkPolicy/HTTP/Ingress
2023-04-28T13:44:43.1291080Z     l7networkpolicy_test.go:119: Creating ANP test-l7-http-allow-path-hostname
2023-04-28T13:44:43.1484306Z     l7networkpolicy_test.go:119: Creating ANP test-l7-http-allow-any-path
2023-04-28T13:44:50.3504900Z     l7networkpolicy_test.go:129: 
2023-04-28T13:44:50.3506113Z         	Error Trace:	/home/runner/work/antrea/antrea/test/e2e/l7networkpolicy_test.go:129
2023-04-28T13:44:50.3507144Z         	            				/home/runner/work/antrea/antrea/test/e2e/l7networkpolicy_test.go:230
2023-04-28T13:44:50.3507590Z         	Error:      	Received unexpected error:
2023-04-28T13:44:50.3508183Z         	            	timed out waiting for the condition
2023-04-28T13:44:50.3508623Z         	Test:       	TestL7NetworkPolicy/HTTP/Ingress
2023-04-28T13:44:55.6458529Z     l7networkpolicy_test.go:153: 
2023-04-28T13:44:55.6462089Z         	Error Trace:	/home/runner/work/antrea/antrea/test/e2e/l7networkpolicy_test.go:153
2023-04-28T13:44:55.6463103Z         	            				/home/runner/work/antrea/antrea/test/e2e/l7networkpolicy_test.go:230
2023-04-28T13:44:55.6463559Z         	Error:      	Received unexpected error:
2023-04-28T13:44:55.6464255Z         	            	timed out waiting for the condition
2023-04-28T13:44:55.6464698Z         	Test:       	TestL7NetworkPolicy/HTTP/Ingress
2023-04-28T13:45:02.9906418Z     l7networkpolicy_test.go:153: 
2023-04-28T13:45:02.9907496Z         	Error Trace:	/home/runner/work/antrea/antrea/test/e2e/l7networkpolicy_test.go:153
2023-04-28T13:45:02.9908662Z         	            				/home/runner/work/antrea/antrea/test/e2e/l7networkpolicy_test.go:238
2023-04-28T13:45:02.9909310Z         	Error:      	Received unexpected error:
2023-04-28T13:45:02.9909996Z         	            	timed out waiting for the condition
2023-04-28T13:45:02.9910514Z         	Test:       	TestL7NetworkPolicy/HTTP/Ingress
2023-04-28T13:45:04.9970094Z === RUN   TestL7NetworkPolicy/HTTP/Egress
2023-04-28T13:45:04.9973409Z     l7networkpolicy_test.go:119: Creating ANP test-l7-http-allow-path-hostname
2023-04-28T13:45:05.0099490Z     l7networkpolicy_test.go:119: Creating ANP test-l7-http-allow-any-path
2023-04-28T13:45:12.2442674Z     l7networkpolicy_test.go:129: 
2023-04-28T13:45:12.2443361Z         	Error Trace:	/home/runner/work/antrea/antrea/test/e2e/l7networkpolicy_test.go:129
2023-04-28T13:45:12.2444477Z         	            				/home/runner/work/antrea/antrea/test/e2e/l7networkpolicy_test.go:257
2023-04-28T13:45:12.2445456Z         	Error:      	Received unexpected error:
2023-04-28T13:45:12.2446038Z         	            	timed out waiting for the condition
2023-04-28T13:45:12.2446476Z         	Test:       	TestL7NetworkPolicy/HTTP/Egress
2023-04-28T13:45:17.5346412Z     l7networkpolicy_test.go:153: 
2023-04-28T13:45:17.5347069Z         	Error Trace:	/home/runner/work/antrea/antrea/test/e2e/l7networkpolicy_test.go:153
2023-04-28T13:45:17.5348057Z         	            				/home/runner/work/antrea/antrea/test/e2e/l7networkpolicy_test.go:257
2023-04-28T13:45:17.5348513Z         	Error:      	Received unexpected error:
2023-04-28T13:45:17.5349093Z         	            	timed out waiting for the condition
2023-04-28T13:45:17.5349528Z         	Test:       	TestL7NetworkPolicy/HTTP/Egress
2023-04-28T13:45:24.8929894Z     l7networkpolicy_test.go:153: 
2023-04-28T13:45:24.8932380Z         	Error Trace:	/home/runner/work/antrea/antrea/test/e2e/l7networkpolicy_test.go:153
2023-04-28T13:45:24.8933604Z         	            				/home/runner/work/antrea/antrea/test/e2e/l7networkpolicy_test.go:265
2023-04-28T13:45:24.8934149Z         	Error:      	Received unexpected error:
2023-04-28T13:45:24.8934796Z         	            	timed out waiting for the condition
2023-04-28T13:45:24.8935324Z         	Test:       	TestL7NetworkPolicy/HTTP/Egress
2023-04-28T13:45:25.0669498Z I0428 13:45:25.066642   18840 framework.go:2395] Sending SIGINT to 'antrea-agent-coverage'
2023-04-28T13:45:25.1354703Z I0428 13:45:25.135048   18840 framework.go:2401] Copying coverage files from Pod 'antrea-agent-nd7qc'
2023-04-28T13:45:25.3880147Z I0428 13:45:25.387744   18840 framework.go:2395] Sending SIGINT to 'antrea-agent-coverage'
2023-04-28T13:45:25.4697727Z I0428 13:45:25.469498   18840 framework.go:2401] Copying coverage files from Pod 'antrea-agent-s8mbh'
2023-04-28T13:45:25.7340618Z I0428 13:45:25.733749   18840 framework.go:2395] Sending SIGINT to 'antrea-agent-coverage'
2023-04-28T13:45:25.8049506Z I0428 13:45:25.804667   18840 framework.go:2401] Copying coverage files from Pod 'antrea-agent-z2lwm'
2023-04-28T13:45:38.0564737Z === CONT  TestL7NetworkPolicy
2023-04-28T13:45:38.0566476Z     fixtures.go:294: Exporting test logs to '/home/runner/work/antrea/antrea/log/TestL7NetworkPolicy/beforeTeardown.Apr28-13-45-38'
2023-04-28T13:45:40.9359527Z     fixtures.go:465: Deleting 'testl7networkpolicy-gb8vhyqf' K8s Namespace
2023-04-28T13:45:40.9440113Z I0428 13:45:40.943737   18840 framework.go:682] Deleting Namespace testl7networkpolicy-gb8vhyqf took 8.008208ms
2023-04-28T13:45:40.9440838Z --- FAIL: TestL7NetworkPolicy (79.39s)
2023-04-28T13:45:40.9441479Z     --- FAIL: TestL7NetworkPolicy/HTTP (47.79s)
2023-04-28T13:45:40.9442088Z         --- FAIL: TestL7NetworkPolicy/HTTP/Ingress (19.87s)
2023-04-28T13:45:40.9442987Z         --- FAIL: TestL7NetworkPolicy/HTTP/Egress (19.90s)

These are not flakes: The same test failures have happened for me 4 times in a row with recent PRs, and I have also been able to reproduce the failure locally.

To Reproduce
In a K8s cluster deploy Antrea with the L7NetworkPolicy Feature Gate enabled. Then run the TestL7NetworkPolicy e2e tests. If you have a Kind cluster, the command is as follows:

go test -v -run=TestL7NetworkPolicy ./test/e2e/... -provider=kind

Versions:
Latest Antrea (main branch). Antrea v1.11.1 works fine.

Additional context
After investigating, I am pretty confident that this is because of a Suricata version update:

  • Antrea v1.11.1 uses Suricata 6.0.10
  • When building a new Antrea image, Suricata 6.0.11 is used

Why is this happening now?
The Suricata PPA (https://launchpad.net/~oisf/+archive/ubuntu/suricata-6.0) was updated 2 weeks ago, yet the failures started happening recently (last 24 hours). Based on my investigation, this is because the Docker build was using a cached version of Antrea base images. Some change in the build chain (I am not sure what) caused the cached images to become stale, causing the latest Suricata version to be installed.

@antoninbas antoninbas added the kind/bug Categorizes issue or PR as related to a bug. label Apr 29, 2023
@antoninbas antoninbas added this to the Antrea v1.12 release milestone Apr 29, 2023
@antoninbas
Copy link
Contributor Author

cc @tnqn @hongliangl

@hongliangl would you mind looking into this since you have worked with the Suricata team before, for a different bug?

@antoninbas antoninbas added area/network-policy Issues or PRs related to network policies. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. and removed priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. labels Apr 29, 2023
@hongliangl
Copy link
Contributor

I discovered that Suricata's suricatasc tool is encountering issues when executing commands that involve multiple tenants in Suricata 6.0.11. This causes the tool to become unresponsive and leads to the L7NetworkPolicy not syncing correctly in Antrea. I have raised an issue regarding this problem. Here is the link https://redmine.openinfosecfoundation.org/issues/6027.

@hongliangl
Copy link
Contributor

I believe we have two workarounds:

  • Modify the method of synchronizing L7 NetworkPolicies. It appears that we can still configure multiple tenants in the YAML file for Suricata, but this would require significant changes to the code, and every time when adding/deleting/updating a L7 NetworkPolicy, a full update of YAML file is needed and reload the process of Suricata.
  • Reverse the antrea-base image.

@antoninbas
Copy link
Contributor Author

It seems that the best thing to do for now would be to wait for a patch and for the 6.0.12 release. This should be considered a blocker for Antrea v1.12.0.

  1. Reversing the Antrea base image is not practical. Our base images are always being updated to pick up the latest software, and we do tag reuse.
  2. Unfortunately, OISF doesn't keep old builds around. So we cannot use the PPA to install an older version (6.0.10).
  3. Installing Suricata from source would be possible but is quite complex and that would be a big change to our build (see https://github.com/jasonish/docker-suricata/blob/main/6.0/Dockerfile.amd64).

@hongliangl
Copy link
Contributor

I think Suricata 6.0.x is typically released every two months according to its release history. However, version 6.0.11 was released just three weeks ago. This means that the next version (6.0.12) is still around five weeks away. Unfortunately, this will be a blocker for Antrea v1.12.0 regardless.

I have another idea - could we manually create a new Antrea base image (v1.12.0) based on the existing v1.11.0 image? This way, we could update all software components except for Suricata.

@antoninbas
Copy link
Contributor Author

antrea/base-ubuntu:antrea-v1.11 hasn't been updated recently and still has the "correct" Suricata version:

$ docker run antrea/base-ubuntu:antrea-v1.11 suricata -V
This is Suricata version 6.0.10 RELEASE

However, this solution is a bit flimsy, and it means we may not be getting the latest software updates for other dependencies. We also have to make changes here and there to prevent the base image from being rebuilt.

Building from source may be the best option. At least we will have the flexibility to pick the exact Suricata version we want to use, if the same issue comes up in the future.

BTW, Ubuntu comes with Suricata 6.0.4 by default (main PPA, not OISF PPA). Would that version work for us?
Adding @tnqn to see what he thinks.

antoninbas added a commit to antoninbas/antrea that referenced this issue May 4, 2023
The latest release from Suricata could suffer from a bug, impacting
Antrea features. When installing from PPA, we don't have flexibility
when it comes to the Suricata version we ship with Antrea.

Instead, we can install Suricata from source as part of our Docker
build.

The advantages are:

* full control over the version we install.
* a smaller Antrea image, as we do not need to install all of Suricata's
  dependencies / enable all its features.

The disadvantages of building from source are:

* CVEs in Suricata itself won't be detected by scanners (CVEs in
  Suricata's dependencies will).
* While we have more customization options, we also have to manually
  keep track of Suricata's dependencies (and use correct build options).
* We have to remember to update the Suricata version we build & install.

For antrea-io#4921

This can also be treated as a temporary fix until Suricata 6.0.12 is
released, at which point we could revert this change.

Signed-off-by: Antonin Bas <abas@vmware.com>
@tnqn
Copy link
Member

tnqn commented May 4, 2023

Some change in the build chain (I am not sure what) caused the cached images to become stale, causing the latest Suricata version to be installed.

@antoninbas Sorry, I should have sent the message that the image was updated by me manually on April 28th, and should have validated the new version first.

BTW, Ubuntu comes with Suricata 6.0.4 by default (main PPA, not OISF PPA). Would that version work for us?

We don't have dependency on 6.0.11 except the bugfix, so it should work for us, but does UBI has its own source which uses a previous patch release too?

@hongliangl I see the issue has been fixed and backported to 6.x: OISF/suricata@fe45258, could you check with Suricata team when it's expected to release 6.0.12? If it's before our v1.12.0, perhaps we could use some workaround. I still have the previous base-ubuntu version which contains suricata 6.0.10, it should continue using the cache if I force update the image)

@antoninbas
Copy link
Contributor Author

it should continue using the cache if I force update the image

If there is a change in the base ubuntu:22.04 image, it should invalidate any cached image. It seems that the ubuntu:22.04 tag was updated yesterday.

antoninbas added a commit to antoninbas/antrea that referenced this issue May 4, 2023
To avoid a known issue with Suricata 6.0.11.
The main Ubuntu PPA ships Suircata 6.0.4.

With this change, e2e tests for L7NetworkPolicy will stop failing.

We do not "fix" the UBI build at the moment, but it will be taken care
of before the Antrea v1.12 release.

For antrea-io#4921

Signed-off-by: Antonin Bas <abas@vmware.com>
tnqn pushed a commit that referenced this issue May 8, 2023
To avoid a known issue with Suricata 6.0.11.
The main Ubuntu PPA ships Suircata 6.0.4.

With this change, e2e tests for L7NetworkPolicy will stop failing.

We do not "fix" the UBI build at the moment, but it will be taken care
of before the Antrea v1.12 release.

For #4921

Signed-off-by: Antonin Bas <abas@vmware.com>
@antoninbas
Copy link
Contributor Author

@xliuxu Suricata v6.0.12 has been released: https://suricata.io/2023/05/09/suricata-6-0-12-released/
Do you think you could validate it and revert #4933?

@xliuxu
Copy link
Contributor

xliuxu commented May 11, 2023

@antoninbas Sure. I will test it.

@xliuxu
Copy link
Contributor

xliuxu commented May 12, 2023

$ go test -v -run=TestL7NetworkPolicy ./test/e2e/... -provider=kind
2023/05/12 01:24:47 Test logs (if any) will be exported under the '/tmp/antrea-test-3124671037' directory
2023/05/12 01:24:48 Creating K8s ClientSet
2023/05/12 01:24:48 Collecting information about K8s cluster
2023/05/12 01:24:48 Pod IPv4 network: '10.244.0.0/16'
2023/05/12 01:24:48 Service IPv4 network: '10.96.0.0/16'
2023/05/12 01:24:48 Num nodes: 3
2023/05/12 01:24:48 Applying Antrea YAML
2023/05/12 01:24:49 Waiting for all Antrea DaemonSet Pods
2023/05/12 01:24:50 Checking CoreDNS deployment
=== RUN   TestL7NetworkPolicy
    fixtures.go:228: Creating 'testl7networkpolicy-9dx07sbx' K8s Namespace
2023/05/12 01:24:50 Applying Antrea YAML
2023/05/12 01:24:50 Waiting for all Antrea DaemonSet Pods
2023/05/12 01:24:51 Checking CoreDNS deployment
=== RUN   TestL7NetworkPolicy/HTTP
=== RUN   TestL7NetworkPolicy/HTTP/Ingress
    l7networkpolicy_test.go:119: Creating ANP test-l7-http-allow-path-hostname
    l7networkpolicy_test.go:119: Creating ANP test-l7-http-allow-any-path
=== RUN   TestL7NetworkPolicy/HTTP/Egress
    l7networkpolicy_test.go:119: Creating ANP test-l7-http-allow-path-hostname
    l7networkpolicy_test.go:119: Creating ANP test-l7-http-allow-any-path
=== CONT  TestL7NetworkPolicy
    fixtures.go:465: Deleting 'testl7networkpolicy-9dx07sbx' K8s Namespace
I0512 01:25:48.189564 3527767 framework.go:682] Deleting Namespace testl7networkpolicy-9dx07sbx took 3.788615ms
--- PASS: TestL7NetworkPolicy (57.99s)
    --- PASS: TestL7NetworkPolicy/HTTP (27.27s)
        --- PASS: TestL7NetworkPolicy/HTTP/Ingress (6.63s)
        --- PASS: TestL7NetworkPolicy/HTTP/Egress (5.60s)
PASS

$ kubectl -n kube-system exec antrea-agent-dzh4m -c antrea-agent -- suricata -V
This is Suricata version 6.0.12 RELEASE

Seems the issue has been fixed in 6.0.12. I will revert the previous PR.

ceclinux pushed a commit to ceclinux/antrea that referenced this issue Jun 5, 2023
…rea-io#4933)

To avoid a known issue with Suricata 6.0.11.
The main Ubuntu PPA ships Suircata 6.0.4.

With this change, e2e tests for L7NetworkPolicy will stop failing.

We do not "fix" the UBI build at the moment, but it will be taken care
of before the Antrea v1.12 release.

For antrea-io#4921

Signed-off-by: Antonin Bas <abas@vmware.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/network-policy Issues or PRs related to network policies. kind/bug Categorizes issue or PR as related to a bug. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants