Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add network policy info to IPFIX flow records as part of Flow Exporter #1268

Merged
merged 1 commit into from
Nov 12, 2020

Conversation

srikartati
Copy link
Member

@srikartati srikartati commented Sep 18, 2020

Network policy info (name and namespace), both ingress and egress
policies are added.

Async rule cache in idAllocator is configured with asyncDeleteInterval that is maximum of either 5s or configured flowPollInterval in Antrea Agent.

@antrea-bot
Copy link
Collaborator

Thanks for your PR.
Unit tests and code linters are run automatically every time the PR is updated.
E2e, conformance and network policy tests can only be triggered by a member of the vmware-tanzu organization. Regular contributors to the project should join the org.

The following commands are available:

  • /test-e2e: to trigger e2e tests.
  • /skip-e2e: to skip e2e tests.
  • /test-conformance: to trigger conformance tests.
  • /skip-conformance: to skip conformance tests.
  • /test-whole-conformance: to trigger all conformance tests on linux.
  • /skip-whole-conformance: to skip all conformance tests on linux.
  • /test-networkpolicy: to trigger networkpolicy tests.
  • /skip-networkpolicy: to skip networkpolicy tests.
  • /test-windows-conformance: to trigger windows conformance tests.
  • /skip-windows-conformance: to skip windows conformance tests.
  • /test-windows-networkpolicy: to trigger windows networkpolicy tests.
  • /skip-windows-networkpolicy: to skip windows networkpolicy tests.
  • /test-hw-offload: to trigger ovs hardware offload test.
  • /skip-hw-offload: to skip ovs hardware offload test.
  • /test-all: to trigger all tests (except whole conformance).
  • /skip-all: to skip all tests (except whole conformance).

@codecov-commenter
Copy link

codecov-commenter commented Sep 18, 2020

Codecov Report

Merging #1268 into master will decrease coverage by 10.20%.
The diff coverage is 9.09%.

Impacted file tree graph

@@             Coverage Diff             @@
##           master    #1268       +/-   ##
===========================================
- Coverage   54.98%   44.78%   -10.21%     
===========================================
  Files         110       73       -37     
  Lines       10573     5564     -5009     
===========================================
- Hits         5814     2492     -3322     
+ Misses       4185     2791     -1394     
+ Partials      574      281      -293     
Flag Coverage Δ
#integration-tests 44.78% <9.09%> (-0.18%) ⬇️
#unit-tests ?

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
.../agent/flowexporter/connections/conntrack_linux.go 0.00% <0.00%> (-8.20%) ⬇️
pkg/agent/flowexporter/connections/connections.go 35.89% <10.00%> (-33.18%) ⬇️
pkg/agent/proxy/service.go 0.00% <0.00%> (-100.00%) ⬇️
pkg/ovs/ovsconfig/default.go 0.00% <0.00%> (-100.00%) ⬇️
pkg/agent/util/iptables/lock.go 0.00% <0.00%> (-81.82%) ⬇️
pkg/agent/proxy/endpoints.go 0.00% <0.00%> (-81.43%) ⬇️
pkg/cni/client.go 0.00% <0.00%> (-75.52%) ⬇️
...kg/agent/flowexporter/connections/conntrack_ovs.go 0.00% <0.00%> (-74.51%) ⬇️
pkg/agent/flowexporter/connections/conntrack.go 0.00% <0.00%> (-68.43%) ⬇️
pkg/agent/querier/querier.go 0.00% <0.00%> (-66.67%) ⬇️
... and 54 more

@codecov-io
Copy link

codecov-io commented Oct 31, 2020

Codecov Report

Merging #1268 (e8c6dda) into master (9d3d10b) will decrease coverage by 4.39%.
The diff coverage is 49.65%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1268      +/-   ##
==========================================
- Coverage   63.31%   58.92%   -4.40%     
==========================================
  Files         170      179       +9     
  Lines       14250    15022     +772     
==========================================
- Hits         9023     8852     -171     
- Misses       4292     5199     +907     
- Partials      935      971      +36     
Flag Coverage Δ
e2e-tests 21.89% <10.95%> (?)
kind-e2e-tests 45.76% <45.22%> (-9.63%) ⬇️
unit-tests 40.96% <17.02%> (-0.32%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
cmd/antrea-agent/agent.go 0.00% <0.00%> (ø)
.../agent/flowexporter/connections/conntrack_linux.go 28.37% <0.00%> (-0.79%) ⬇️
pkg/agent/stats/collector.go 92.04% <ø> (-5.69%) ⬇️
pkg/ovs/openflow/ofctrl_builder.go 59.48% <0.00%> (-2.69%) ⬇️
pkg/ovs/openflow/ofctrl_group.go 48.57% <0.00%> (-4.56%) ⬇️
pkg/ovs/openflow/ofctrl_action.go 53.36% <5.71%> (-8.00%) ⬇️
pkg/agent/agent.go 48.32% <33.33%> (-0.39%) ⬇️
pkg/agent/proxy/types/types.go 60.00% <33.33%> (-24.62%) ⬇️
pkg/agent/controller/networkpolicy/reconciler.go 69.39% <50.00%> (-1.93%) ⬇️
pkg/agent/openflow/pipeline.go 63.07% <53.96%> (-6.70%) ⬇️
... and 55 more

@srikartati srikartati changed the title [WIP]Add network policy info to IPFIX flow records as part of Flow Exporter Add network policy info to IPFIX flow records as part of Flow Exporter Nov 2, 2020
@srikartati srikartati added this to the Antrea v0.11.0 release milestone Nov 2, 2020
Copy link
Contributor

@jianjuns jianjuns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR LGTM. I know you need to wait for the other one merged first.

@@ -145,6 +155,39 @@ func (cs *ConnectionStore) addOrUpdateConn(conn *flowexporter.Connection) {
}
}
}

// Add network policy name and namespace from Connection label
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

network policy -> NetworkPolicy

ingressOfID := binary.LittleEndian.Uint32(conn.Labels[:4])
egressOfID := binary.LittleEndian.Uint32(conn.Labels[4:8])
// TODO: There's a chance that the ingressOfID is released and reused by another
// NetworkPolicy rule could be modified in-between by the time the conntrack
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Guess you need to rephrase the sentence.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing this. This is not applicable after PR #1411 is merged, so removed it. I will be changing asyncDeleteInterval to pollInterval of Flow Exporter.
Rephrased the below comment.

// metrics. We probably need a different solution.
if ingressOfID != 0 {
policy := cs.ofClient.GetPolicyFromConjunction(ingressOfID)
// Same as above, this may be because the NetworkPolicy is removed right after the metrics are fetched.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is "metrics" or "flow record"?

@antrea-bot
Copy link
Collaborator

Thanks for your PR.
Unit tests and code linters are run automatically every time the PR is updated.
E2e, conformance and network policy tests can only be triggered by a member of the vmware-tanzu organization. Regular contributors to the project should join the org.

The following commands are available:

  • /test-e2e: to trigger e2e tests.
  • /skip-e2e: to skip e2e tests.
  • /test-conformance: to trigger conformance tests.
  • /skip-conformance: to skip conformance tests.
  • /test-all-features-conformance: to trigger conformance tests with all alpha features enabled.
  • /skip-all-features-conformance: to skip conformance tests with all alpha features enabled.
  • /test-whole-conformance: to trigger all conformance tests on linux.
  • /skip-whole-conformance: to skip all conformance tests on linux.
  • /test-networkpolicy: to trigger networkpolicy tests.
  • /skip-networkpolicy: to skip networkpolicy tests.
  • /test-windows-conformance: to trigger windows conformance tests.
  • /skip-windows-conformance: to skip windows conformance tests.
  • /test-windows-networkpolicy: to trigger windows networkpolicy tests.
  • /skip-windows-networkpolicy: to skip windows networkpolicy tests.
  • /test-hw-offload: to trigger ovs hardware offload test.
  • /skip-hw-offload: to skip ovs hardware offload test.
  • /test-all: to trigger all tests (except whole conformance).
  • /skip-all: to skip all tests (except whole conformance).

@srikartati
Copy link
Member Author

srikartati commented Nov 10, 2020

This PR LGTM. I know you need to wait for the other one merged first.

I updated the patch with modifications after asyncRuleCache PR. Could you please review the patch again?

jianjuns
jianjuns previously approved these changes Nov 11, 2020
@@ -145,6 +156,36 @@ func (cs *ConnectionStore) addOrUpdateConn(conn *flowexporter.Connection) {
}
}
}

// Add NetworkPolicy name and namespace from Connection label
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: namespace -> Namespace
Add "." at EOL.

@srikartati
Copy link
Member Author

Thanks for the review. Will wait for IPv6 PR to merge: #1518
And then do the rebase.

Copy link
Contributor

@antoninbas antoninbas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

couple nits, otherwise LGTM. Is there no plan to test that feature?

Comment on lines 67 to 82
func newIDAllocator(flowPollInterval time.Duration, allocatedIDs ...uint32) *idAllocator {
allocator := &idAllocator{
availableSet: make(map[uint32]struct{}),
asyncRuleCache: cache.NewStore(asyncRuleCacheKeyFunc),
deleteQueue: workqueue.NewNamedDelayingQueue("async_delete_networkpolicyrule"),
}

// Set the asyncDeleteInterval based on flowPollInterval value.
if asyncDeleteInterval < flowPollInterval {
asyncDeleteInterval = flowPollInterval
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like flowPollInterval does not really belong in this code. Maybe the parameter should be named asyncDeleteInterval instead and the current asyncDeleteInterval variable should be renamed to minAsyncDeleteInterval

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I was not comfortable too. This is a good suggestion. Changed other parts as well to keep flow poll interval separate.

@@ -145,6 +156,35 @@ func (cs *ConnectionStore) addOrUpdateConn(conn *flowexporter.Connection) {
}
}
}

// Add NetworkPolicy Name and Namespace from the connection label.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Add NetworkPolicy Name and Namespace from the connection label.
// Retrieve NetworkPolicy Name and Namespace by using the ID stored in the connection label (for both ingress and egress directions)

Copy link
Contributor

@zyiou zyiou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested exporter with multiple ingress and egress network policy using elk flow collector. LGTM.

Copy link
Member Author

@srikartati srikartati left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added more unit tests and enhanced e2e test to test network policy support in Flow Exporter.

Comment on lines 67 to 82
func newIDAllocator(flowPollInterval time.Duration, allocatedIDs ...uint32) *idAllocator {
allocator := &idAllocator{
availableSet: make(map[uint32]struct{}),
asyncRuleCache: cache.NewStore(asyncRuleCacheKeyFunc),
deleteQueue: workqueue.NewNamedDelayingQueue("async_delete_networkpolicyrule"),
}

// Set the asyncDeleteInterval based on flowPollInterval value.
if asyncDeleteInterval < flowPollInterval {
asyncDeleteInterval = flowPollInterval
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I was not comfortable too. This is a good suggestion. Changed other parts as well to keep flow poll interval separate.

@srikartati srikartati force-pushed the flowexp_np branch 4 times, most recently from b9afb1e to 3e80300 Compare November 11, 2020 20:10
@@ -145,13 +145,19 @@ func run(o *Options) error {
// notifying NetworkPolicyController to reconcile rules related to the
// updated Pods.
podUpdates := make(chan v1beta2.PodReference, 100)
// We set flow poll interval as the time interval for rule deletion in the async
// rule cache, which is implemented as the part of idAllocator. This is to preserve
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/as the part of/as part of

@@ -145,13 +145,19 @@ func run(o *Options) error {
// notifying NetworkPolicyController to reconcile rules related to the
// updated Pods.
podUpdates := make(chan v1beta2.PodReference, 100)
// We set flow poll interval as the time interval for rule deletion in the async
// rule cache, which is implemented as the part of idAllocator. This is to preserve
// the rule info for mapping in the flow records of Flow Exporter even after
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the rule info for populating NetworkPolicy fields in the Flow Exporter even after rule deletion

@@ -29,7 +29,8 @@ import (
)

var (
asyncDeleteInterval = time.Second * 5
minAsyncDeleteInterval = time.Second * 5
asyncDeleteInterval = time.Second * 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that should really be a parameter of newIDAllocator

Copy link
Contributor

@antoninbas antoninbas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I think in the future we may want to investigate breaking up TestFlowExporter into several test cases

t.Fatalf("Records with PodAIP and PodBIP does not have Pod Namespace")
}
// Check if records have both ingress and egress network policies.
if !strings.Contains(record, "octetDeltaCount: 0") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's a weird condition, could you add a comment?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added.

@@ -118,13 +128,6 @@ func (cs *ConnectionStore) addOrUpdateConn(conn *flowexporter.Connection) {
conn.DestinationPodName = dIface.ContainerInterfaceConfig.PodName
conn.DestinationPodNamespace = dIface.ContainerInterfaceConfig.PodNamespace
}
// Do not export flow records of connections whose destination is local Pod and source is remote Pod.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so now that we generate 2 "copies", is there any confusion when we consume the flow records in ELK?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's possible because of two copies of flows. Need to work with @zyiou to look into that.

I was in two minds whether to enable flow records from the destination node or disable because of flow collector implications. Need some more time. I am ok with continuing the old format of not to export to avoid confusion and take this up in a following PR that can be included in a patch release. What do you think?

Copy link
Contributor

@antoninbas antoninbas Nov 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we don't export from the destination Node, then we will never have ingress network policy information unless the source & destination nodes are the same. However, unless I am mistaken this is already the case today for other fields (e.g. destination Pod name)

That wouldn't really be a candidate for a patch release, based on the impact on record consumers and the fact that this (FlowExporter) is an Alpha feature. But we could resolve this for 0.12.

I'll leave it up to you. Let's chat offline about the impact on the consumers of flow records, and especially on our reference ELK stack.

@@ -51,6 +52,56 @@ func TestFlowExporter(t *testing.T) {
if err != nil {
t.Fatalf("Error when getting the perftest server Pod's IP: %v", err)
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe it's worth adding a comment somewhere, that because both source & destination Pods run on the same Node, we do not receive duplicate records from 2 different Nodes and records include "complete" information (e.g. both ingress & egress policy information)

Copy link
Member Author

@srikartati srikartati left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is an issue to enhance e2e tests Issue #1438. It is a good idea to break this up. Thanks.

@@ -118,13 +128,6 @@ func (cs *ConnectionStore) addOrUpdateConn(conn *flowexporter.Connection) {
conn.DestinationPodName = dIface.ContainerInterfaceConfig.PodName
conn.DestinationPodNamespace = dIface.ContainerInterfaceConfig.PodNamespace
}
// Do not export flow records of connections whose destination is local Pod and source is remote Pod.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's possible because of two copies of flows. Need to work with @zyiou to look into that.

I was in two minds whether to enable flow records from the destination node or disable because of flow collector implications. Need some more time. I am ok with continuing the old format of not to export to avoid confusion and take this up in a following PR that can be included in a patch release. What do you think?

t.Fatalf("Records with PodAIP and PodBIP does not have Pod Namespace")
}
// Check if records have both ingress and egress network policies.
if !strings.Contains(record, "octetDeltaCount: 0") {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added.

@srikartati srikartati force-pushed the flowexp_np branch 2 times, most recently from 11e1cc4 to 360d0b5 Compare November 12, 2020 04:43
antoninbas
antoninbas previously approved these changes Nov 12, 2020
Copy link
Contributor

@antoninbas antoninbas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@srikartati
Copy link
Member Author

/test-all

@srikartati
Copy link
Member Author

/test-conformance
/test-e2e
/test-networkpolicy

1 similar comment
@srikartati
Copy link
Member Author

/test-conformance
/test-e2e
/test-networkpolicy

@srikartati
Copy link
Member Author

/test-conformance

Network policy info (name and namespace), both ingress and egress
policies are added.

As part of this commit, we support exporting both flow records, for
the case of inter-Node flows. One from the source Node, where the
flow originates from and other from the destination Node of the flow,
where the destination Pod resides. The reason is that they contain
different NetworkPolicy info.
@srikartati
Copy link
Member Author

/test-all

@srikartati
Copy link
Member Author

As there are no IPv6 related changes in this patch. Merging it without testing ipv6 tests.

@srikartati srikartati merged commit 8a3ee6b into antrea-io:master Nov 12, 2020
@srikartati srikartati deleted the flowexp_np branch November 12, 2020 19:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants