Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change flow export mechanism for Flow Aggregator #1949

Merged

Conversation

srikartati
Copy link
Member

@srikartati srikartati commented Mar 13, 2021

Introduced active and inactive flow timeout configuration knobs.
With active flow record timeout, every flow record is sent to the collector based on its own active expiry timeout value.
With inactive flow record timeout, a flow record is sent to the collector if no records are seen by the flow aggregator for the flow since it was last updated.
Followed the mechanism proposed in RFC: https://tools.ietf.org/html/rfc6183#section-5.3.1
Used priority queue approach to keep track of individual expiry values for records.

In addition, changing the Flow Exporter activeFlowTimeout to 30s from 60s, to keep it lower than the activeFlowTimeout at Flow Aggregator.

Fixes: #1637

@srikartati srikartati force-pushed the flow_aggregator_change_export_interval branch 3 times, most recently from de8c85e to 4d15b90 Compare March 24, 2021 19:10
@codecov-io
Copy link

codecov-io commented Mar 24, 2021

Codecov Report

Merging #1949 (67995f5) into main (9b8440d) will decrease coverage by 23.37%.
The diff coverage is 58.69%.

Impacted file tree graph

@@             Coverage Diff             @@
##             main    #1949       +/-   ##
===========================================
- Coverage   65.39%   42.01%   -23.38%     
===========================================
  Files         197      125       -72     
  Lines       17217    15448     -1769     
===========================================
- Hits        11259     6491     -4768     
- Misses       4785     8415     +3630     
+ Partials     1173      542      -631     
Flag Coverage Δ
kind-e2e-tests ?
unit-tests 42.01% <58.69%> (+0.15%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
pkg/controller/traceflow/controller.go 66.88% <ø> (-2.60%) ⬇️
pkg/flowaggregator/flowaggregator.go 37.95% <53.93%> (-19.12%) ⬇️
pkg/flowaggregator/priorityqueue.go 100.00% <100.00%> (ø)
pkg/ovs/openflow/ofctrl_packetout.go 71.49% <100.00%> (-17.29%) ⬇️
pkg/agent/agent_linux.go 0.00% <0.00%> (-100.00%) ⬇️
pkg/ovs/openflow/default.go 0.00% <0.00%> (-100.00%) ⬇️
pkg/agent/config/node_config.go 0.00% <0.00%> (-100.00%) ⬇️
pkg/apis/controlplane/helper.go 0.00% <0.00%> (-100.00%) ⬇️
pkg/agent/cniserver/pod_configuration_linux.go 0.00% <0.00%> (-100.00%) ⬇️
pkg/apis/controlplane/v1beta1/register.go 0.00% <0.00%> (-92.86%) ⬇️
... and 160 more

@antrea-io antrea-io deleted a comment from lgtm-com bot Mar 24, 2021
@srikartati srikartati force-pushed the flow_aggregator_change_export_interval branch from 4d15b90 to 845cf6a Compare March 25, 2021 13:51
@antrea-io antrea-io deleted a comment from lgtm-com bot Mar 25, 2021
@antrea-io antrea-io deleted a comment from lgtm-com bot Mar 25, 2021
@srikartati srikartati changed the title [WIP] Change flow export mechanism for Flow Aggregator Change flow export mechanism for Flow Aggregator Mar 25, 2021
@srikartati srikartati force-pushed the flow_aggregator_change_export_interval branch from 845cf6a to 886c22f Compare March 25, 2021 14:10
@srikartati srikartati marked this pull request as draft March 25, 2021 15:17
@srikartati srikartati force-pushed the flow_aggregator_change_export_interval branch 2 times, most recently from 67995f5 to 02fe6a2 Compare March 26, 2021 21:26
@srikartati srikartati marked this pull request as ready for review March 26, 2021 21:37
@srikartati
Copy link
Member Author

/test-all

build/yamls/flow-aggregator.yml Outdated Show resolved Hide resolved
pkg/flowaggregator/flowaggregator.go Show resolved Hide resolved
pkg/flowaggregator/flowaggregator.go Outdated Show resolved Hide resolved
pkg/flowaggregator/flowaggregator.go Outdated Show resolved Hide resolved
pkg/flowaggregator/flowaggregator.go Outdated Show resolved Hide resolved
pkg/ipfix/ipfix_intermediate.go Outdated Show resolved Hide resolved
Copy link
Member Author

@srikartati srikartati left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comments @zyiou
e2e tests are failing. Still working on fixing them. Moving the PR to draft state.

build/yamls/flow-aggregator.yml Outdated Show resolved Hide resolved
pkg/flowaggregator/flowaggregator.go Outdated Show resolved Hide resolved
pkg/flowaggregator/flowaggregator.go Outdated Show resolved Hide resolved
pkg/ipfix/ipfix_intermediate.go Outdated Show resolved Hide resolved
@srikartati srikartati marked this pull request as draft March 31, 2021 19:19
srikartati added a commit to srikartati/go-ipfix that referenced this pull request Apr 17, 2021
Reverting this commit 7c0a0ae.
Reason is the PR(antrea-io/antrea#1949) that depends on this commit
is not yet ready. And to move the tag of go-ipfix in Antrea for other
PRs this commit should be present in the latest release tags. Therefore,
reverting this.
srikartati added a commit to vmware/go-ipfix that referenced this pull request Apr 19, 2021
Reverting this commit 7c0a0ae.
Reason is the PR(antrea-io/antrea#1949) that depends on this commit
is not yet ready. And to move the tag of go-ipfix in Antrea for other
PRs this commit should be present in the latest release tags. Therefore,
reverting this.
@srikartati srikartati force-pushed the flow_aggregator_change_export_interval branch from 02fe6a2 to 5c3396a Compare May 3, 2021 23:33
@srikartati srikartati marked this pull request as ready for review May 3, 2021 23:35
@codecov-commenter
Copy link

codecov-commenter commented May 4, 2021

Codecov Report

Merging #1949 (36b301c) into main (a9c7744) will increase coverage by 3.79%.
The diff coverage is 74.41%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1949      +/-   ##
==========================================
+ Coverage   61.30%   65.09%   +3.79%     
==========================================
  Files         273      273              
  Lines       20646    20644       -2     
==========================================
+ Hits        12656    13438     +782     
+ Misses       6688     5823     -865     
- Partials     1302     1383      +81     
Flag Coverage Δ
e2e-tests 56.26% <74.41%> (?)
kind-e2e-tests 51.99% <74.41%> (-0.32%) ⬇️
unit-tests 41.01% <5.12%> (-0.03%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
pkg/agent/flowexporter/exporter/exporter.go 77.55% <60.00%> (+2.31%) ⬆️
pkg/flowaggregator/flowaggregator.go 62.88% <73.52%> (+4.80%) ⬆️
pkg/ipfix/ipfix_intermediate.go 83.33% <100.00%> (+3.33%) ⬆️
pkg/apiserver/certificate/cacert_controller.go 58.27% <0.00%> (-7.95%) ⬇️
...gent/controller/networkpolicy/status_controller.go 72.60% <0.00%> (-5.48%) ⬇️
pkg/agent/cniserver/ipam/ipam_service.go 63.15% <0.00%> (-5.27%) ⬇️
pkg/agent/cniserver/ipam/ipam_delegator.go 48.83% <0.00%> (-4.66%) ⬇️
pkg/agent/cniserver/pod_configuration.go 54.68% <0.00%> (-2.74%) ⬇️
pkg/agent/cniserver/server.go 68.91% <0.00%> (-1.93%) ⬇️
pkg/apiserver/apiserver.go 89.51% <0.00%> (-1.62%) ⬇️
... and 48 more

@srikartati srikartati force-pushed the flow_aggregator_change_export_interval branch 2 times, most recently from 64bf933 to f4bbf2a Compare May 5, 2021 23:48
@srikartati
Copy link
Member Author

/test-all

@srikartati
Copy link
Member Author

/test-e2e

}

func (fa *flowAggregator) flowRecordExpiryCheck(stopCh <-chan struct{}) {
expireTicker := time.NewTicker(fa.activeFlowRecordTimeout)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think a ticker is the best construct anymore? We essentially call fa.aggregationProcess.GetExpiryFromExpirePriorityQueue and set the ticker to the returned value, but we will only use the ticker for a single tick... Maybe use a timer instead?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the timer makes more sense and it's simpler too. Modified.

pkg/flowaggregator/flowaggregator.go Outdated Show resolved Hide resolved
pkg/flowaggregator/flowaggregator.go Outdated Show resolved Hide resolved
@@ -609,7 +609,8 @@ func (data *TestData) mutateFlowAggregatorConfigMap(ipfixCollector string) error
}
flowAggregatorConf, _ := configMap.Data[flowAggregatorConfName]
flowAggregatorConf = strings.Replace(flowAggregatorConf, "#externalFlowCollectorAddr: \"\"", fmt.Sprintf("externalFlowCollectorAddr: \"%s\"", ipfixCollector), 1)
flowAggregatorConf = strings.Replace(flowAggregatorConf, "#flowExportInterval: 60s", "flowExportInterval: 5s", 1)
flowAggregatorConf = strings.Replace(flowAggregatorConf, "#activeFlowRecordTimeout: 60s", "activeFlowRecordTimeout: 4s", 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is this change for? Maybe add a comment about this choice of values

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment.

@srikartati
Copy link
Member Author

/test-all
/test-ipv6-e2e
/test-ipv6-only-e2e

Copy link
Contributor

@zyiou zyiou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for the change.
What is left is to wait a v0.5.2 tag with klog fix. I don't know why it's not shown here.

// Initializing exporting process fails, will retry in next exportInterval
klog.Errorf("Error when initializing exporting process: %v, will retry in %s", err, fa.activeFlowRecordTimeout)
// Initializing exporting process fails, will retry in next cycle.
expireTimer = time.NewTimer(fa.activeFlowRecordTimeout)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can use expireTimer.Reset to avoid creating new timers.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was not aware of the method.. Reset makes sense here.

return err
}

klog.V(4).Infof("Data set sent successfully: %d Bytes sent.", sentBytes)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remove the trailing period to keep same log style

@srikartati srikartati force-pushed the flow_aggregator_change_export_interval branch 3 times, most recently from c08f7af to 05e584b Compare May 18, 2021 18:00
antoninbas
antoninbas previously approved these changes May 18, 2021
Copy link
Contributor

@antoninbas antoninbas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Introduce active and inactive flow timeout configuration
knobs.
With active flow record timeout, every flow record is sent to the
collector based on its own active expiry timeout value.
With inactive flow record timeout, a flow record is sent to the
collector if no records are seen by the flow aggregator for the flow,
since it was last updated in the flow aggregator map.

Used priority queue approach to keep track of individual expiry values
for records.

In addition, we changed the default active flow timeout at the flow
exporter to 30s.
@srikartati
Copy link
Member Author

/test-ipv6-e2e
/test-ipv6-only-e2e
/test-e2e

Copy link
Member

@tnqn tnqn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@srikartati
Copy link
Member Author

/test-all

@srikartati
Copy link
Member Author

/test-ipv6-e2e
/test-ipv6-only-e2e

@srikartati
Copy link
Member Author

IPv6 jenkins e2e test seem to be broken. There are v6 specific changes. Merging for now.

@srikartati srikartati merged commit 335e6bb into antrea-io:main May 19, 2021
@srikartati srikartati deleted the flow_aggregator_change_export_interval branch May 19, 2021 19:10
@zyiou zyiou added area/flow-visibility Issues or PRs related to flow visibility support in Antrea area/flow-visibility/aggregator Issues or PRs related to Flow Aggregator labels Jun 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/flow-visibility/aggregator Issues or PRs related to Flow Aggregator area/flow-visibility Issues or PRs related to flow visibility support in Antrea
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Exporting flows over a time interval in Flow Exporter and Flow Aggregator
7 participants