Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve kind e2e tests #3922

Merged
merged 1 commit into from
Jun 29, 2022
Merged

Improve kind e2e tests #3922

merged 1 commit into from
Jun 29, 2022

Conversation

tnqn
Copy link
Member

@tnqn tnqn commented Jun 21, 2022

There were 3 feature specific jobs running e2e tests with AntreaProxy
disabled, AntreaProxy and proxyAll enabled, and AntreaPolicy disabled
separately. Having a job for each feature is not extensible and
requires maintenance efforts to configure the job along with the
feature's lifecycle. There is a lot of redundancy between these jobs as
unrelated tests ran repeatedly in them.

There were other feature specific test cases enabling the features at
runtime, which doesn't cover the scenario when two features are enabled
at the same time.

This patch removes some feature specific jobs and creates one job
running tests with all features enabled, which can test all features'
interaction. The patch also makes necessary changes to kind test script
and manifest generating script to achieve it.

Signed-off-by: Quan Tian qtian@vmware.com

@tnqn tnqn force-pushed the kind-test branch 2 times, most recently from 4c972ac to 258b38b Compare June 21, 2022 17:56
@codecov-commenter
Copy link

codecov-commenter commented Jun 21, 2022

Codecov Report

Merging #3922 (e3b7e0e) into main (40a952b) will increase coverage by 1.09%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #3922      +/-   ##
==========================================
+ Coverage   64.00%   65.10%   +1.09%     
==========================================
  Files         293      293              
  Lines       43300    43300              
==========================================
+ Hits        27716    28192     +476     
+ Misses      13340    12880     -460     
+ Partials     2244     2228      -16     
Flag Coverage Δ
e2e-tests 40.30% <ø> (?)
kind-e2e-tests 51.34% <ø> (+0.87%) ⬆️
unit-tests 44.56% <ø> (+0.03%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
pkg/agent/cniserver/ipam/antrea_ipam.go 51.08% <0.00%> (-24.68%) ⬇️
pkg/agent/flowexporter/exporter/certificate.go 50.00% <0.00%> (-16.67%) ⬇️
pkg/ipam/poolallocator/allocator.go 47.22% <0.00%> (-7.23%) ⬇️
...agent/flowexporter/connections/deny_connections.go 80.45% <0.00%> (-3.45%) ⬇️
pkg/agent/cniserver/ipam/antrea_ipam_controller.go 79.45% <0.00%> (-1.37%) ⬇️
pkg/agent/flowexporter/exporter/exporter.go 77.97% <0.00%> (-0.75%) ⬇️
pkg/agent/openflow/client.go 68.57% <0.00%> (-0.55%) ⬇️
pkg/agent/openflow/pipeline.go 73.36% <0.00%> (-0.24%) ⬇️
pkg/agent/route/route_linux.go 48.05% <0.00%> (ø)
pkg/agent/ipassigner/ip_assigner_linux.go 47.82% <0.00%> (+0.54%) ⬆️
... and 34 more

@tnqn tnqn changed the title Improve kind test coverage Improve kind e2e tests Jun 22, 2022
@tnqn tnqn marked this pull request as ready for review June 22, 2022 11:26
@tnqn tnqn requested review from antoninbas and jianjuns June 22, 2022 11:26
@tnqn
Copy link
Member Author

tnqn commented Jun 22, 2022

/test-all

@tnqn
Copy link
Member Author

tnqn commented Jun 22, 2022

/test-all

antoninbas
antoninbas previously approved these changes Jun 22, 2022
Copy link
Contributor

@antoninbas antoninbas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I much prefer this approach of not enabling features directly in e2e test cases.

mkdir test-e2e-encap-proxy-all-coverage
ANTREA_LOG_DIR=$PWD/log ANTREA_COV_DIR=$PWD/test-e2e-encap-proxy-all-coverage ./ci/kind/test-e2e-kind.sh --encap-mode encap --proxy-all --coverage --skip mode-irrelevant
mkdir test-e2e-encap-all-features-disabled-coverage
ANTREA_LOG_DIR=$PWD/log ANTREA_COV_DIR=$PWD/test-e2e-encap-all-features-disabled-coverage ./ci/kind/test-e2e-kind.sh --encap-mode encap --coverage --feature-gates AllAlpha=false,AllBeta=false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we just test with default values - keep default enabled features enabled, and default disabled features disabled? Usually, no one will disable default enabled features right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"test-e2e-encap" tests the configuration you describe, it has been there from the begining.
If we don't have this job "all-features-disabled", there will be no job testing the code under the ! DefaultFeatureGate.Enabled(FEATURE) condition, which were previously covered by "test-e2e-encap-no-proxy", "test-e2e-encap-no-np". Is it safe that once a feature graduates to beta, we never run test with it being disabled?

Copy link
Contributor

@jianjuns jianjuns Jun 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. In this case, do you think it makes more sense to test only no-proxy, assuming people may like to disable AntreaProxy, but still enable other default features?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jianjuns do you agree with above or have other comments?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then the feature specific code under ! DefaultFeatureGate.Enabled(FEATURE) except AntreaProxy will not be covered by any e2e test, is this reliable? For example, if a PR makes a change to a flow in L3Forwarding with the assumption that the packet will be handled by some flows installed by Egress feature, no test would catch the problem. It's not normal that users disable features that are enabled by default, but should we cover them in test with best effort? this may also affect test coverage.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added "all features disabled" not because it's an important combination, but a single job to cover the code path that all optional features are disabled and ensure core functions work properly without relying on any optional feature (When these features are disabled, no feature specific test case will run, the test cases are only the generic ones like Pod connectivey, K8s NetworkPolicy, etc.). I think it's not just adding 1 feature combination, but 0 coverage vs. full coverage on related code path, "all features enabled" covers the other side. And I think "all features disabled" covers "no-proxy" in some way from core functions's perspective, assuming optional features are not too coupled with each other, (and if they are coupled explicitly, disabling either of them is not a supported scenario and we won't even test it).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding configuration combination, I very agree we cannot cover all combinations, I think all tests are only testing the default combination with single factor mutated:
test-e2e-ipv6 and test-e2e-ipv6-only's factor is ip family
test-windows's factor is platform
test-e2e-noencap and test-e2e-hybrid's factor is encapulation mode
test-e2e-all-features-enabled and test-e2e-all-features-disabled is planed for the factor of feature

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the questions are:

  1. do we believe "AntreaProxy disabled" is a case used by considerable number of users
  2. if so, are these users would like other default features enabled
  3. if so, could this combination of "AntreaProxy disabled" + "other features enabled" have a good chance to be broken, if we do not test it

Do you mean you think the chance of 3) is low, if we test "AntreaProxy and all other features disabled"? I feel AntreaProxy flows can impact other features like ANP and Egress, no?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I only know one company disables it because of insufficient kernel version
  2. I guess no because the above reason doesn't apply to other features, however, just disabling AntreaProxy and keeping other configuration as default is not exactly their configuration. AFAIK, at least they change encapsulation
    mode and tunnel type: they use both noEncap mode and encap mode with vxlan tunnel. I'm not sure other configurations.
  3. It has a chance to be broken, but I imagine it shouldn't be too related to other features. For examples, ANP doesn't care whether it's kube proxy or antrea proxy, it cares the 5-tuple anyway, with one exception, it relies on AntreaProxy to implement toServices, but it's documented as unsupported and test didn't run when AntreaProxy is disabled. For Egress, if the traffic is desined to external address directly, how proxy works doesn't matter at all; if the traffic is destined to Service whose backend is external address, I guess it even doesn't work today when AntreaProxy is disabled because we match InPort when determining whether to SNAT.

I agree with your proposal after reconsidering the test from another perspective: every feature should finally graduate to GA or deprecated, however as long as the kernel version restriction exists or users have a reason to stick to kube-proxy, AntreaProxy will GA and be removed from feature gate and perhaps a configuration "antreaProxy.enable" should be added like --run-service-proxy of kube-router. And if we think it's an important scenario like noEncap, we could keep a dedicated job testing it being disabled and all features being kept as default. Before that happens, we configure the job via "AntreaProxy feature gate", but it's for a typical configuration like noEncap. From feature gate's perspective, we only test a job with their default setting and a job with all of them enabled, the former of which covers the most common configuration and the latter ensures a non-default feature will be tested.

Copy link
Contributor

@jianjuns jianjuns Jun 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I always wonder if we really need to test with default enabled feature gates disabled. The only reason to disable a feature gate should be either defects or maybe a feature does not work in particular envs. We should know if we really have the latter case (and ideally feature gate should not be for that).

I know a confusion is that we use feature gate for both gating and enablement configuration in some cases.

Copy link
Contributor

@jianjuns jianjuns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I found I did not publish my earlier reply.

mkdir test-e2e-encap-proxy-all-coverage
ANTREA_LOG_DIR=$PWD/log ANTREA_COV_DIR=$PWD/test-e2e-encap-proxy-all-coverage ./ci/kind/test-e2e-kind.sh --encap-mode encap --proxy-all --coverage --skip mode-irrelevant
mkdir test-e2e-encap-all-features-disabled-coverage
ANTREA_LOG_DIR=$PWD/log ANTREA_COV_DIR=$PWD/test-e2e-encap-all-features-disabled-coverage ./ci/kind/test-e2e-kind.sh --encap-mode encap --coverage --feature-gates AllAlpha=false,AllBeta=false
Copy link
Contributor

@jianjuns jianjuns Jun 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. In this case, do you think it makes more sense to test only no-proxy, assuming people may like to disable AntreaProxy, but still enable other default features?

ANTREA_LOG_DIR=$PWD/log ANTREA_COV_DIR=$PWD/test-e2e-encap-no-proxy-coverage ./ci/kind/test-e2e-kind.sh --encap-mode encap --no-proxy --coverage --skip mode-irrelevant
mkdir test-e2e-encap-all-features-enabled-coverage
# Currently multicast tests require specific testbeds, exclude it for now.
ANTREA_LOG_DIR=$PWD/log ANTREA_COV_DIR=$PWD/test-e2e-encap-all-features-enabled-coverage ./ci/kind/test-e2e-kind.sh --encap-mode encap --coverage --feature-gates AllAlpha=true,AllBeta=true,Multicast=false --proxy-all
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will take a look at this issue. 👀

There were 3 feature specific jobs running e2e tests with AntreaProxy
disabled, AntreaProxy and proxyAll enabled, and AntreaPolicy disabled
separately. Having a job for each feature is not extensible and
requires maintenance efforts to configure the job along with the
feature's lifecycle. There is a lot of redundancy between these jobs as
unrelated tests ran repeatedly in them.

There were other feature specific test cases enabling the features at
runtime, which doesn't cover the scenario when two features are enabled
at the same time.

This patch removes some feature specific jobs and creates one job
running tests with all features enabled, which can test all features'
interaction. The patch also makes necessary changes to kind test script
and manifest generating script to achieve it.

Signed-off-by: Quan Tian <qtian@vmware.com>
Copy link
Contributor

@jianjuns jianjuns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tnqn
Copy link
Member Author

tnqn commented Jun 28, 2022

/test-all

@tnqn tnqn merged commit 0313479 into antrea-io:main Jun 29, 2022
@tnqn tnqn deleted the kind-test branch June 29, 2022 15:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants