Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix the bug in Flow Exporter that cause intermittent errors in e2e tests #1959

Merged
merged 1 commit into from
Mar 17, 2021

Conversation

srikartati
Copy link
Member

@srikartati srikartati commented Mar 16, 2021

The bug is because IPFIX exporting process interface in Flow Exporter is assigned with typecasted nil (typecasted to corresponding structure ipfixExportingProcess). This code was modified in PR #1714 . Therefore, the code comparing with nil breaks in exporting process initialization code of Flow Exporter.
This PR fixes the bug along with another issue in fetching certificates that exacerbates this bug when the Flow Exporter cannot connect to the Flow Aggregator.

Fixes #1956

@codecov-io
Copy link

codecov-io commented Mar 16, 2021

Codecov Report

Merging #1959 (14fb9b5) into main (020946f) will decrease coverage by 14.77%.
The diff coverage is 43.54%.

❗ Current head 14fb9b5 differs from pull request most recent head e19983a. Consider uploading reports for the commit e19983a to get more accurate results
Impacted file tree graph

@@             Coverage Diff             @@
##             main    #1959       +/-   ##
===========================================
- Coverage   64.37%   49.60%   -14.78%     
===========================================
  Files         193      189        -4     
  Lines       16967    16109      -858     
===========================================
- Hits        10923     7991     -2932     
- Misses       4885     7056     +2171     
+ Partials     1159     1062       -97     
Flag Coverage Δ
e2e-tests 49.60% <43.54%> (?)
kind-e2e-tests ?
unit-tests ?

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
pkg/agent/agent.go 51.55% <0.00%> (+3.83%) ⬆️
pkg/agent/agent_linux.go 100.00% <ø> (ø)
pkg/agent/openflow/client_other.go 0.00% <0.00%> (ø)
pkg/agent/openflow/pipeline.go 66.10% <14.28%> (+0.65%) ⬆️
pkg/agent/flowexporter/exporter/exporter.go 70.48% <64.28%> (+2.41%) ⬆️
pkg/agent/openflow/client.go 62.57% <66.66%> (+2.39%) ⬆️
pkg/agent/openflow/pipeline_other.go 66.66% <66.66%> (ø)
pkg/controller/networkpolicy/crd_utils.go 0.00% <0.00%> (-90.22%) ⬇️
pkg/controller/networkpolicy/status_controller.go 0.00% <0.00%> (-88.82%) ⬇️
...g/controller/networkpolicy/clusternetworkpolicy.go 0.00% <0.00%> (-87.15%) ⬇️
... and 113 more

@srikartati
Copy link
Member Author

/test-e2e

@srikartati srikartati force-pushed the fix_e2e_test branch 2 times, most recently from 157b87d to fbc63ad Compare March 16, 2021 17:14
@srikartati
Copy link
Member Author

/test-e2e

@srikartati srikartati force-pushed the fix_e2e_test branch 3 times, most recently from 92c0032 to 869ab55 Compare March 16, 2021 20:15
@srikartati
Copy link
Member Author

/test-e2e

@srikartati srikartati force-pushed the fix_e2e_test branch 2 times, most recently from 6ef5f00 to ca54d93 Compare March 17, 2021 01:27
@srikartati
Copy link
Member Author

/test-e2e

@srikartati srikartati changed the title [Not for review]Debug errors in CI runs Fix Flow Aggregator e2e test error in CI Mar 17, 2021
@srikartati
Copy link
Member Author

/test-e2e

@srikartati srikartati force-pushed the fix_e2e_test branch 2 times, most recently from 4342b4d to e19983a Compare March 17, 2021 04:02
@srikartati
Copy link
Member Author

/test-e2e

2 similar comments
@srikartati
Copy link
Member Author

/test-e2e

@srikartati
Copy link
Member Author

/test-e2e

@srikartati
Copy link
Member Author

/test-all

@srikartati
Copy link
Member Author

/test-e2e

@srikartati
Copy link
Member Author

srikartati commented Mar 17, 2021

All e2e test runs are successful (total 8 runs) after this fix except for two times when the connection to harbor registry failed.

First run: https://jenkins.antrea-ci.rocks/job/antrea-e2e-for-pull-request/2203/
Failed runs due to harbor registry unreachability:https://jenkins.antrea-ci.rocks/job/antrea-e2e-for-pull-request/2205/
https://jenkins.antrea-ci.rocks/job/antrea-e2e-for-pull-request/2206/
Last run successful: https://jenkins.antrea-ci.rocks/job/antrea-e2e-for-pull-request/2214/

@@ -115,33 +118,6 @@ func prepareExporterInputArgs(collectorAddr, collectorProto string, enableTLSToF
if err != nil {
return expInput, err
}

if enableTLSToFlowAggregator {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should the unused parameters be removed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Missed it.

@@ -191,7 +169,9 @@ func (exp *flowExporter) Export() {
// There could be other errors while initializing flow exporter other than connecting to IPFIX collector,
// therefore closing the connection and resetting the process.
if exp.process != nil {
exp.process.CloseConnToCollector()
if !(reflect.ValueOf(exp.process).Kind() == reflect.Ptr && reflect.ValueOf(exp.process).IsNil()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can be avoided by only setting exp.process when the funtion that constructs it doesn't return error, then we won't assign nil of a concrete type to it and can simply check nil when using exp.process.
For instance:

	process, err = ipfix.NewIPFIXExportingProcess(exp.exporterInput)
	if err != nil {
		return fmt.Errorf("error when starting exporter: %v", err)
	}
        exp.process = process

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. This also explains why this crash was not happening before with similar scenarios even though they are not that often. The code was like this before PR #1714 , so it was not an issue before. Thanks for the pointer. Fixed it.

CI e2e tests are failing during or after TestFlowAggregator
Errors are happening when flow aggregator service DNS resolution is
not successful. I was able to get the backtrace that leads to crashing
the agent.

Added fix when checking the interface for nil to resolve it.
@srikartati
Copy link
Member Author

/test-all

Copy link
Contributor

@antoninbas antoninbas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but I wish the commit title and message were more helpful and accurate.
They should probably focus on the nature of the bug being fixed instead of the e2e test failure which is a consequence of the bug.

@srikartati
Copy link
Member Author

/test-conformance

@srikartati srikartati changed the title Fix Flow Aggregator e2e test error in CI Fix the bug introduced in PR #1714 that cause intermittent errors in e2e tests Mar 17, 2021
@srikartati srikartati changed the title Fix the bug introduced in PR #1714 that cause intermittent errors in e2e tests Fix the bug in Flow Exporter that cause intermittent errors in e2e tests Mar 17, 2021
@srikartati
Copy link
Member Author

LGTM, but I wish the commit title and message were more helpful and accurate.
They should probably focus on the nature of the bug being fixed instead of the e2e test failure which is a consequence of the bug.

Thanks for the review. Added that. I originally created PR to debug the e2e tests in CI, so overlooked that part.

@srikartati srikartati merged commit a9adf5a into antrea-io:main Mar 17, 2021
@srikartati srikartati deleted the fix_e2e_test branch March 18, 2021 18:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

e2e cases failed quite often after testing TestFlowAggregator
5 participants