Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duplicate Node Results for Live Traceflow #4714

Closed
antoninbas opened this issue Mar 16, 2023 · 0 comments · Fixed by #4715
Closed

Duplicate Node Results for Live Traceflow #4714

antoninbas opened this issue Mar 16, 2023 · 0 comments · Fixed by #4715
Assignees
Labels
area/ops/traceflow Issues or PRs related to the Traceflow feature kind/bug Categorizes issue or PR as related to a bug.

Comments

@antoninbas
Copy link
Contributor

Describe the bug
While running a Live Traceflow, I got the following result:

apiVersion: crd.antrea.io/v1alpha1
kind: Traceflow
metadata:
  creationTimestamp: "2023-03-15T23:29:38Z"
  generation: 1
  labels:
    ui.antrea.io: ""
  name: 0705f0f7-bf12-409e-b5af-60ca50397356
  resourceVersion: "131846"
  uid: 6da11b4e-116d-4992-9825-967d1380a605
spec:
  destination:
    namespace: kube-system
    pod: antrea-ui-9d78f97bc-52lss
  droppedOnly: false
  liveTraffic: true
  packet:
    ipHeader:
      protocol: 6
    transportHeader:
      tcp:
        dstPort: 3000
        flags: 2
        srcPort: 0
  source: {}
  timeout: 30
status:
  capturedPacket:
    dstIP: 10.10.1.24
    ipHeader:
      flags: 2
      protocol: 6
      ttl: 62
    length: 64
    srcIP: 10.10.0.1
    transportHeader:
      tcp:
        dstPort: 3000
        flags: 2
        srcPort: 62012
  phase: Succeeded
  results:
  - node: k8s-node-worker-1
    observations:
    - action: Received
      component: Forwarding
    - action: Delivered
      component: Forwarding
      componentInfo: Output
    timestamp: 1678922981
  - node: k8s-node-worker-1
    observations:
    - action: Received
      component: Forwarding
    - action: Delivered
      component: Forwarding
      componentInfo: Output
    timestamp: 1678922981
  startTime: "2023-03-15T23:29:38Z"

The Traceflow Status is not correct because the Node Result is duplicated.

To Reproduce

To reproduce, one needs to generate 2 connections in the cluster that will match the Live Traceflow filter. Easier way to do that is to use an ICMP Traceflow and run 2 ping commands. However, the 2 connections must run "at the same time", typically with ~1ms of each other, or the Antrea Agent Traceflow Controller will have time to uninstall the Traceflow OVS flows:

if tfState.liveTraffic && firstPacket {
// Uninstall the OVS flows after receiving the first packet, to
// avoid capturing too many matched packets.
c.ofClient.UninstallTraceflowFlows(tag)

Expected
Status should include a single, non-duplicated, Node Result:

status:
  capturedPacket:
    dstIP: 10.10.1.24
    ipHeader:
      flags: 2
      protocol: 6
      ttl: 62
    length: 64
    srcIP: 10.10.0.1
    transportHeader:
      tcp:
        dstPort: 3000
        flags: 2
        srcPort: 62012
  phase: Succeeded
  results:
  - node: k8s-node-worker-1
    observations:
    - action: Received
      component: Forwarding
    - action: Delivered
      component: Forwarding
      componentInfo: Output
    timestamp: 1678922981

Actual behavior
The Status is incorrect, as shown above.

Versions:
Antrea v1.10 and main branch

@antoninbas antoninbas added kind/bug Categorizes issue or PR as related to a bug. area/ops/traceflow Issues or PRs related to the Traceflow feature labels Mar 16, 2023
@antoninbas antoninbas self-assigned this Mar 16, 2023
antoninbas added a commit to antoninbas/antrea that referenced this issue Mar 16, 2023
After receiving a Packet In for a live Traceflow, the controller
uninstalls the corresponding OVS flows to prevent additional Packet Ins
(from different connections which also match the Live Traceflow
filters).

However, if 2 connections happen within a very short time window (< 1ms
in my testbed) and both match the Live Traceflow filters, it is still
possible for 2 Packet In messages to be received. To avoid duplicate
Node Results in the Live Traceflow Status, we need to ignore all Packet
In messages received after the first one.

Fixes antrea-io#4714

Signed-off-by: Antonin Bas <abas@vmware.com>
antoninbas added a commit to antoninbas/antrea that referenced this issue Mar 17, 2023
After receiving a Packet In for a live Traceflow, the controller
uninstalls the corresponding OVS flows to prevent additional Packet Ins
(from different connections which also match the Live Traceflow
filters).

However, if 2 connections happen within a very short time window (< 1ms
in my testbed) and both match the Live Traceflow filters, it is still
possible for 2 Packet In messages to be received. To avoid duplicate
Node Results in the Live Traceflow Status, we need to ignore all Packet
In messages received after the first one.

Fixes antrea-io#4714

Signed-off-by: Antonin Bas <abas@vmware.com>
tnqn pushed a commit that referenced this issue Mar 17, 2023
After receiving a Packet In for a live Traceflow, the controller
uninstalls the corresponding OVS flows to prevent additional Packet Ins
(from different connections which also match the Live Traceflow
filters).

However, if 2 connections happen within a very short time window (< 1ms
in my testbed) and both match the Live Traceflow filters, it is still
possible for 2 Packet In messages to be received. To avoid duplicate
Node Results in the Live Traceflow Status, we need to ignore all Packet
In messages received after the first one.

Fixes #4714
jainpulkit22 pushed a commit to urharshitha/antrea that referenced this issue Apr 28, 2023
After receiving a Packet In for a live Traceflow, the controller
uninstalls the corresponding OVS flows to prevent additional Packet Ins
(from different connections which also match the Live Traceflow
filters).

However, if 2 connections happen within a very short time window (< 1ms
in my testbed) and both match the Live Traceflow filters, it is still
possible for 2 Packet In messages to be received. To avoid duplicate
Node Results in the Live Traceflow Status, we need to ignore all Packet
In messages received after the first one.

Fixes antrea-io#4714
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/ops/traceflow Issues or PRs related to the Traceflow feature kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant