Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix network policy issue in userspace datapath of Flow Exporter #2194

Merged
merged 1 commit into from
May 22, 2021

Conversation

dreamtalen
Copy link
Contributor

In this commit, we add the parsing code of labels field for dump-conntrack command output in OVS userspace datapath situation.
This will fix the missing network policy information issue in flow records in OVS userspace datapath, like in Kind cluster.

}
hexval, err := hex.DecodeString(labelStr)
if err != nil {
return nil, fmt.Errorf("conversion of labelStr %s to []byte failed", labelStr)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"labelStr"? I don't think using the name of a variable in a log message makes too much sense. Maybe replace it with "label string".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, addressed.

}
hexval, err := hex.DecodeString(labelStr)
if err != nil {
return nil, fmt.Errorf("conversion of labelStr %s to []byte failed", labelStr)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not add the error to this message like we typically do?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, also added the error to other messages in this function.

Comment on lines +260 to +264
for i := 0; i < len(hexval)/2; i++ {
hexval[i], hexval[len(hexval)-i-1] = hexval[len(hexval)-i-1], hexval[i]
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add a comment explaining why the slice needs to be reversed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, the main reason is because the kernel side's label output is little endian, and we would like to keep the logics in connections.go unchanged:

if len(conn.Labels) != 0 {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good comment to add. BTW I am not a big fan that we assume that we get little-endian labels from the kernel / netlink. CTA_LABELS just uses the "native" endianness for the system, which happens to be little-endian for us. If we were to test Antrea on a big endian CPU, this assumption would break and we would get wrong labels. Not that we test or claim support for such CPUs (although I think ARM CPUs can be used in big-endian mode).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, I could open an issue to track this unappropriate assumption.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good to me. You may want to check with @srikartati as well, I didn't research this in depth

Copy link
Member

@srikartati srikartati May 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a good point, @antoninbas.
Most of the modern CPUs based on x86 architecture are little-endian like Intel, AMD, etc. As Antonin suggested, there are definitely big-endian CPUs like ARM, PowerPC, etc. I see that ARM is bi-endian with default as little-endian through google search.
In any case, we should detect the endianness of the system to cover all scenarios rather than having separate implementation for all architectures--this should be simple. We can do this when initializing conntrack dumper. IMO this does not warrant an issue. We can have a comment in the conntrack_linux.go code or fix it in this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Srikar, I added a comment in conntrack_linux.go.

Comment on lines +253 to +256
if len(labelStr) < 16 {
labelStr = strings.Repeat("0", 16-len(labelStr)) + labelStr
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe a comment explaining why this is needed would be good here as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, added

@codecov-commenter
Copy link

codecov-commenter commented May 20, 2021

Codecov Report

Merging #2194 (3306145) into main (335e6bb) will increase coverage by 4.04%.
The diff coverage is 42.10%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #2194      +/-   ##
==========================================
+ Coverage   61.09%   65.14%   +4.04%     
==========================================
  Files         273      273              
  Lines       20644    20672      +28     
==========================================
+ Hits        12613    13466     +853     
+ Misses       6713     5816     -897     
- Partials     1318     1390      +72     
Flag Coverage Δ
e2e-tests 56.17% <0.00%> (?)
kind-e2e-tests 52.13% <42.10%> (+0.08%) ⬆️
unit-tests 41.20% <42.10%> (+0.17%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
pkg/agent/flowexporter/connections/connections.go 76.05% <ø> (ø)
.../agent/flowexporter/connections/conntrack_linux.go 86.73% <ø> (+25.51%) ⬆️
...kg/agent/flowexporter/connections/conntrack_ovs.go 77.19% <42.10%> (+0.17%) ⬆️
...gent/controller/networkpolicy/status_controller.go 72.60% <0.00%> (-2.74%) ⬇️
pkg/controller/traceflow/controller.go 71.68% <0.00%> (ø)
...ntroller/networkpolicy/networkpolicy_controller.go 84.72% <0.00%> (+0.16%) ⬆️
pkg/ovs/openflow/ofctrl_bridge.go 50.36% <0.00%> (+0.36%) ⬆️
pkg/agent/openflow/network_policy.go 76.41% <0.00%> (+0.59%) ⬆️
pkg/agent/cniserver/pod_configuration.go 54.68% <0.00%> (+0.78%) ⬆️
pkg/controller/egress/store/egressgroup.go 1.38% <0.00%> (+1.38%) ⬆️
... and 43 more

}
conn.Timeout = uint32(val)
case strings.Contains(fs, "labels"):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we cover this scenario in the unit test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, added

@@ -41,6 +41,7 @@ type connTrackSystem struct {
connTrack NetFilterConnTrack
}

// TODO: detect the endianness of the system when initializing conntrack dumper
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should move the comment to

ingressOfID := binary.LittleEndian.Uint32(conn.Labels[:4])
egressOfID := binary.LittleEndian.Uint32(conn.Labels[4:8])
and provide more context. It's really out of place here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@srikartati I assumed the reason Srikar let me add a comment in the conntrack_linux.go is to keep the code parsing label as little Endian unchanged in connections.go?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that is my intention. I think Antonin feels directly using the little-endian format lacks context.
Therefore, you could add a comment there in connections.go saying that we always expect labels from conntrack dumper to be added in little-endian format to make it clear. Please add the reason that the labels are stored in the same format in kernel datapath on most platforms.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Srikar and Antonin, the comments have been updated.

antoninbas
antoninbas previously approved these changes May 21, 2021
Copy link
Contributor

@antoninbas antoninbas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new comment LGTM

@@ -163,7 +164,7 @@ func flowStringToAntreaConnection(flow string, zoneFilter uint16) (*flowexporter
fields := strings.Split(fs, "=")
val, err := strconv.ParseUint(fields[len(fields)-1], 10, 16)
if err != nil {
return nil, fmt.Errorf("conversion of sport %s to int failed", fields[len(fields)-1])
return nil, fmt.Errorf("conversion of sport %s to int failed with error: %v", fields[len(fields)-1], err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: typically we don't add an explicit "with error" when we wrap errors

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, addressed.

Copy link
Member

@srikartati srikartati left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just have one comment--otherwise LGTM.

// We always expect labels from conntrack dumper to be added in little-endian format right now
// In kernel datapath, the labels uses the "native" endianness for the system, which are little-endian
// on most of the modern CPUs based on x86 architecture like Intel, AMD, etc.
// TODO: detect the endianness of the system when initializing conntrack dumper to handle situations on big-endian platforms.
Copy link
Member

@srikartati srikartati May 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment and the context look good and clear. The TODO part should be in conntrack_linux.go, where this has to be fixed IMO. In addition, TODO should mention connection labels requiring little endian format.
Sorry, maybe I was not clear in my last comment--I meant to add those extra comments here but keep the TODO there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, addressed.

In this commit, we add the parsing code of labels field for
dump-conntrack command output in OVS userspace datapath situation.
This will fix the missing network policy information issue in flow
records in OVS userspace datapath, like in Kind cluster.

Signed-off-by: Yongming Ding <dyongming@vmware.com>
Copy link
Member

@srikartati srikartati left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dreamtalen
Copy link
Contributor Author

/test-all

1 similar comment
@dreamtalen
Copy link
Contributor Author

/test-all

@srikartati srikartati merged commit 8c3adad into antrea-io:main May 22, 2021
@zyiou zyiou added area/flow-visibility Issues or PRs related to flow visibility support in Antrea area/flow-visibility/exporter Issues or PRs related to the Flow Exporter functions in the Agent labels Jun 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/flow-visibility/exporter Issues or PRs related to the Flow Exporter functions in the Agent area/flow-visibility Issues or PRs related to flow visibility support in Antrea
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants