-
Notifications
You must be signed in to change notification settings - Fork 366
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix the bug in Flow Exporter that cause intermittent errors in e2e tests #1959
Conversation
Codecov Report
@@ Coverage Diff @@
## main #1959 +/- ##
===========================================
- Coverage 64.37% 49.60% -14.78%
===========================================
Files 193 189 -4
Lines 16967 16109 -858
===========================================
- Hits 10923 7991 -2932
- Misses 4885 7056 +2171
+ Partials 1159 1062 -97
Flags with carried forward coverage won't be shown. Click here to find out more.
|
129cf4b
to
c1c140d
Compare
/test-e2e |
157b87d
to
fbc63ad
Compare
/test-e2e |
92c0032
to
869ab55
Compare
/test-e2e |
6ef5f00
to
ca54d93
Compare
/test-e2e |
ca54d93
to
bf78427
Compare
/test-e2e |
4342b4d
to
e19983a
Compare
/test-e2e |
2 similar comments
/test-e2e |
/test-e2e |
/test-all |
/test-e2e |
All e2e test runs are successful (total 8 runs) after this fix except for two times when the connection to harbor registry failed. First run: https://jenkins.antrea-ci.rocks/job/antrea-e2e-for-pull-request/2203/ |
@@ -115,33 +118,6 @@ func prepareExporterInputArgs(collectorAddr, collectorProto string, enableTLSToF | |||
if err != nil { | |||
return expInput, err | |||
} | |||
|
|||
if enableTLSToFlowAggregator { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should the unused parameters be removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Missed it.
@@ -191,7 +169,9 @@ func (exp *flowExporter) Export() { | |||
// There could be other errors while initializing flow exporter other than connecting to IPFIX collector, | |||
// therefore closing the connection and resetting the process. | |||
if exp.process != nil { | |||
exp.process.CloseConnToCollector() | |||
if !(reflect.ValueOf(exp.process).Kind() == reflect.Ptr && reflect.ValueOf(exp.process).IsNil()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this can be avoided by only setting exp.process when the funtion that constructs it doesn't return error, then we won't assign nil of a concrete type to it and can simply check nil when using exp.process
.
For instance:
process, err = ipfix.NewIPFIXExportingProcess(exp.exporterInput)
if err != nil {
return fmt.Errorf("error when starting exporter: %v", err)
}
exp.process = process
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right. This also explains why this crash was not happening before with similar scenarios even though they are not that often. The code was like this before PR #1714 , so it was not an issue before. Thanks for the pointer. Fixed it.
CI e2e tests are failing during or after TestFlowAggregator Errors are happening when flow aggregator service DNS resolution is not successful. I was able to get the backtrace that leads to crashing the agent. Added fix when checking the interface for nil to resolve it.
e19983a
to
bd905ca
Compare
/test-all |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but I wish the commit title and message were more helpful and accurate.
They should probably focus on the nature of the bug being fixed instead of the e2e test failure which is a consequence of the bug.
/test-conformance |
Thanks for the review. Added that. I originally created PR to debug the e2e tests in CI, so overlooked that part. |
The bug is because IPFIX exporting process interface in Flow Exporter is assigned with typecasted nil (typecasted to corresponding structure
ipfixExportingProcess
). This code was modified in PR #1714 . Therefore, the code comparing withnil
breaks in exporting process initialization code of Flow Exporter.This PR fixes the bug along with another issue in fetching certificates that exacerbates this bug when the Flow Exporter cannot connect to the Flow Aggregator.
Fixes #1956