Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix libOpenflow crash for some Traceflow requests #1883

Merged
merged 1 commit into from
Feb 18, 2021

Conversation

antoninbas
Copy link
Contributor

libOpenflow panics if a packet-in message is sent by OVS with a
NXM_NX_PKT_MARK match field. Since #1816, it is a possible situation:
Traceflow requests for the Node IP can lead to reply traffic with the
packet mark set, which are sent to the Antrea Agent as a PacketIn
message. To resolve this issue, we first switch temporarily to a patched
libOpenflow version without this issue. When the patch is ported in
upstream libOpenflow, we can remove the replace directive from go.mod.

Fixes #1878

libOpenflow panics if a packet-in message is sent by OVS with a
NXM_NX_PKT_MARK match field. Since antrea-io#1816, it is a possible situation:
Traceflow requests for the Node IP can lead to reply traffic with the
packet mark set, which are sent to the Antrea Agent as a PacketIn
message. To resolve this issue, we first switch temporarily to a patched
libOpenflow version without this issue. When the patch is ported in
upstream libOpenflow, we can remove the replace directive from go.mod.

Fixes antrea-io#1878
@antoninbas
Copy link
Contributor Author

The libOpenflow patch: antoninbas/libOpenflow@32f2e57

@gran-vmv @wenyingd do you know why so many switch cases are not handled correctly in https://github.com/contiv/libOpenflow/blob/01db743640b1c89bc14629581dc94e8250f782c4/openflow13/match.go#L219

@codecov-io
Copy link

codecov-io commented Feb 18, 2021

Codecov Report

❗ No coverage uploaded for pull request base (main@e2591af). Click here to learn what that means.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##             main    #1883   +/-   ##
=======================================
  Coverage        ?   61.73%           
=======================================
  Files           ?      199           
  Lines           ?    17223           
  Branches        ?        0           
=======================================
  Hits            ?    10633           
  Misses          ?     5491           
  Partials        ?     1099           
Flag Coverage Δ
e2e-tests 26.30% <0.00%> (?)
kind-e2e-tests 49.73% <0.00%> (?)
unit-tests 41.02% <0.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

antoninbas added a commit to antoninbas/antrea that referenced this pull request Feb 18, 2021
PR antrea-io#1883 fixes a panic in libOpenflow triggered when OVS receives reply
traffic for a Traceflow request with a valid dataplane tag as the ToS
field and the Linux packet mark set. However, it should be noted that
reply packets for Traceflow requests are generally meaningless and
should be ignored. In encapMode, The Traceflow implementation should
also not timeout when a Traceflow request leaves the overlay: as soon as
the request is forwarded through the gateway port, we should consider
the request complete, and ignore any potential reply packet. So we
include the following changes:

* add a new "ForwardedOutOfOverlay" Traceflow action when a request is
  forwarded out of the network managed by Antrea in encapMode. The
  Controller can then mark the request as "succeeded". In theory,
  something similar could be done for other traffic modes, but it would
  be much more complex.
* add support for Traceflow requests for which the destination is the
  gateway's IP, by reporting a "Delivered" action.
* add an OVS flow in charge of dropping reply traffic for Traceflow
  requests (using the conntrack state to match this traffic), thus
  ensuring it is not set to the Agent. In our testing, this is
  especially useful when the destination IP is the local Node's IP, as
  the IP ToS field seems to be preseved in that case, causing the reply
  packet to be treated as a Traceflow request.

We add end-to-end tests for both cases (external destination IP and
Antrea gateway destination IP).

See antrea-io#1878
Copy link
Contributor

@gran-vmv gran-vmv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.
We need to discuss with @wenyingd for the libOpenflow cases and redirect libOpenflow in go.mod to original repo.

@antoninbas
Copy link
Contributor Author

/test-all

@antoninbas antoninbas merged commit 2343cbd into antrea-io:main Feb 18, 2021
@antoninbas antoninbas deleted the fix-libOpenflow-crash branch February 18, 2021 22:24
antoninbas added a commit to antoninbas/antrea that referenced this pull request Feb 25, 2021
PR antrea-io#1883 fixes a panic in libOpenflow triggered when OVS receives reply
traffic for a Traceflow request with a valid dataplane tag as the ToS
field and the Linux packet mark set. However, it should be noted that
reply packets for Traceflow requests are generally meaningless and
should be ignored. In encapMode, The Traceflow implementation should
also not timeout when a Traceflow request leaves the overlay: as soon as
the request is forwarded through the gateway port, we should consider
the request complete, and ignore any potential reply packet. So we
include the following changes:

* add a new "ForwardedOutOfOverlay" Traceflow action when a request is
  forwarded out of the network managed by Antrea in encapMode. The
  Controller can then mark the request as "succeeded". In theory,
  something similar could be done for other traffic modes, but it would
  be much more complex.
* add support for Traceflow requests for which the destination is the
  gateway's IP, by reporting a "Delivered" action.
* add an OVS flow in charge of dropping reply traffic for Traceflow
  requests (using the conntrack state to match this traffic), thus
  ensuring it is not set to the Agent. In our testing, this is
  especially useful when the destination IP is the local Node's IP, as
  the IP ToS field seems to be preseved in that case, causing the reply
  packet to be treated as a Traceflow request.

We add end-to-end tests for both cases (external destination IP and
Antrea gateway destination IP).

See antrea-io#1878
antoninbas added a commit to antoninbas/antrea that referenced this pull request Feb 26, 2021
libOpenflow panics if a packet-in message is sent by OVS with a
NXM_NX_PKT_MARK match field. Since antrea-io#1816, it is a possible situation:
Traceflow requests for the Node IP can lead to reply traffic with the
packet mark set, which are sent to the Antrea Agent as a PacketIn
message. To resolve this issue, we first switch temporarily to a patched
libOpenflow version without this issue. When the patch is ported in
upstream libOpenflow, we can remove the replace directive from go.mod.

Fixes antrea-io#1878
antoninbas added a commit that referenced this pull request Feb 26, 2021
PR #1883 fixes a panic in libOpenflow triggered when OVS receives reply
traffic for a Traceflow request with a valid dataplane tag as the ToS
field and the Linux packet mark set. However, it should be noted that
reply packets for Traceflow requests are generally meaningless and
should be ignored. In encapMode, The Traceflow implementation should
also not timeout when a Traceflow request leaves the overlay: as soon as
the request is forwarded through the gateway port, we should consider
the request complete, and ignore any potential reply packet. So we
include the following changes:

* add a new "ForwardedOutOfOverlay" Traceflow action when a request is
  forwarded out of the network managed by Antrea in encapMode. The
  Controller can then mark the request as "succeeded". In theory,
  something similar could be done for other traffic modes, but it would
  be much more complex.
* add support for Traceflow requests for which the destination is the
  gateway's IP, by reporting a "Delivered" action.
* add an OVS flow in charge of dropping reply traffic for Traceflow
  requests (using the conntrack state to match this traffic), thus
  ensuring it is not set to the Agent. In our testing, this is
  especially useful when the destination IP is the local Node's IP, as
  the IP ToS field seems to be preserved in that case, causing the reply
  packet to be treated as a Traceflow request.

We add end-to-end tests for both cases (external destination IP and
Antrea gateway destination IP).

See #1878
antoninbas added a commit to antoninbas/antrea that referenced this pull request Feb 26, 2021
libOpenflow panics if a packet-in message is sent by OVS with a
NXM_NX_PKT_MARK match field. Since antrea-io#1816, it is a possible situation:
Traceflow requests for the Node IP can lead to reply traffic with the
packet mark set, which are sent to the Antrea Agent as a PacketIn
message. To resolve this issue, we first switch temporarily to a patched
libOpenflow version without this issue. When the patch is ported in
upstream libOpenflow, we can remove the replace directive from go.mod.

Fixes antrea-io#1878
antoninbas added a commit that referenced this pull request Feb 26, 2021
libOpenflow panics if a packet-in message is sent by OVS with a
NXM_NX_PKT_MARK match field. Since #1816, it is a possible situation:
Traceflow requests for the Node IP can lead to reply traffic with the
packet mark set, which are sent to the Antrea Agent as a PacketIn
message. To resolve this issue, we first switch temporarily to a patched
libOpenflow version without this issue. When the patch is ported in
upstream libOpenflow, we can remove the replace directive from go.mod.

Fixes #1878
antoninbas added a commit that referenced this pull request Feb 26, 2021
libOpenflow panics if a packet-in message is sent by OVS with a
NXM_NX_PKT_MARK match field. Since #1816, it is a possible situation:
Traceflow requests for the Node IP can lead to reply traffic with the
packet mark set, which are sent to the Antrea Agent as a PacketIn
message. To resolve this issue, we first switch temporarily to a patched
libOpenflow version without this issue. When the patch is ported in
upstream libOpenflow, we can remove the replace directive from go.mod.

Fixes #1878
antoninbas added a commit to antoninbas/antrea that referenced this pull request Mar 11, 2021
libOpenflow panics if a packet-in message is sent by OVS with a
NXM_NX_PKT_MARK match field. Since antrea-io#1816, it is a possible situation:
Traceflow requests for the Node IP can lead to reply traffic with the
packet mark set, which are sent to the Antrea Agent as a PacketIn
message. To resolve this issue, we first switch temporarily to a patched
libOpenflow version without this issue. When the patch is ported in
upstream libOpenflow, we can remove the replace directive from go.mod.

Fixes antrea-io#1878
antoninbas added a commit that referenced this pull request Mar 12, 2021
libOpenflow panics if a packet-in message is sent by OVS with a
NXM_NX_PKT_MARK match field. Since #1816, it is a possible situation:
Traceflow requests for the Node IP can lead to reply traffic with the
packet mark set, which are sent to the Antrea Agent as a PacketIn
message. To resolve this issue, we first switch temporarily to a patched
libOpenflow version without this issue. When the patch is ported in
upstream libOpenflow, we can remove the replace directive from go.mod.

Fixes #1878
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Segment fault in libOpenflow for Traceflow to an external IP
5 participants