Skip to content

[CSIT-1936] TRex occasionally sees link down in E8xx (dpdk) tests #4018

@vvalderrv

Description

@vvalderrv

Description

Not sure about all combinations affected, as most tests pass, but one of combinations is 2n-icx e810cq with dpdk plugin: [0].

Perhaps related to CSIT-1932.

[0] https://s3-logs.fd.io/vex-yul-rot-jenkins-1/csit-vpp-perf-report-iterative-2310-2n-icx/31/log.html.gz#s1-s1-s1-s5-s10-t1-k2-k9-k9-k10-k1-k1-k1-k11

Assignee

Unassigned

Reporter

Vratko Polak

Comments

  • vrpolak (Thu, 14 Nov 2024 13:30:04 +0000):

    One test [10] has this symptom:

RuntimeError: Timeout, interfaces not up: ['HundredGigabitEthernet2a/0/0']

I assume that happens when the link goes down already in initial check, before first TRex trial, as teardown shows the original symptom.

[10] https://s3-logs.fd.io/vex-yul-rot-jenkins-1/csit-vpp-perf-report-iterative-2410-2n-spr/29/log.html.gz#s1-s1-s1-s6-s5-t2-k2-k6-k3-k2-k1

  • vrpolak (Mon, 11 Nov 2024 12:54:35 +0000): Summary of console log last lines for quick identification.

MRR STL:

8) Resolving variable '${bandwidth * 1e9}' failed: TypeError: unsupported operand type(s) for *: 'NoneType' and 'float'

MRR ASTF:

No traffic forwarded

NDRPDR STL:

8) Resolving variable '${bandwidth * 1e9}' failed: TypeError: unsupported operand type(s) for *: 'NoneType' and 'float'

NDRPDR ASTF:

No lower bound: TrimmedStat(float_load=9001.0, int_load=0, target_to_stat={TargetSpec(loss_ratio=0.0, exceed_ratio=0.5, discrete_width=DiscreteWidth(float_width=0.004999765787156439, int_width=1), trial_duration=1.0, duration_sum=21.0): TargetStat(good_long=0.0, bad_long=15.856502872891724, good_short=0.0, bad_short=0.0)})

Only STL tests paste the link is DOWN message to console log.

  • vrpolak (Mon, 11 Nov 2024 12:05:26 +0000): For ASTF, those tests if TG-TG suites always fail if preceding STL tests showed link down, the symptom is still all packets unsent [9].

[9] https://logs.fd.io/vex-yul-rot-jenkins-1/csit-trex-perf-ndrpdr-weekly-master-2n-spr/84/log.html.gz#s1-s1-s1-s1-s3-t1-k2-k3-k14

  • vrpolak (Tue, 5 Nov 2024 14:04:02 +0000): Confirming this is still [8] affecting l3fwd tests. Testpmd seems to show CSIT-1972 symptom instead, which has perhaps the same cause but is different enough to be tracked separately, as there TRex does not report unsent packets.

[8] https://s3-logs.fd.io/vex-yul-rot-jenkins-1/csit-dpdk-perf-report-iterative-2410-3nb-spr/8/log.html.gz#s1-s1-s1-s2-t9-k2-k5-k14

  • vrpolak (Wed, 16 Oct 2024 12:13:33 +0000): > And this [3] (ASTF failures with all packets in one direction unsent) is probably the same issue but with TRex in ASTF mode, and it affected all test cases until first STL one, which reset the link.

Similar situation happened, this time in trex tests [7]. I still believe it is this issue just with ASTF, needs STL test to recover, and now we have evidence it can happen on TG-TG link (no SUT needed), confirming this is a pure TRex bug.

[7] https://logs.fd.io/vex-yul-rot-jenkins-1/csit-trex-perf-ndrpdr-weekly-master-2n-spr/81/log.html.gz#s1-s1-s1-s1-s3-t1-k2-k3-k14

  • vrpolak (Fri, 26 Jul 2024 14:52:28 +0000): I also see one run where this affects some [6] L2 tests on 3n-snr+dpdk+e822cq.

[6] https://s3-logs.fd.io/vex-yul-rot-jenkins-1/csit-vpp-perf-report-iterative-2406-3n-snr/57/log.html.gz#s1-s1-s1-s5-s3-t1-k2-k9-k14

  • vrpolak (Mon, 17 Jun 2024 12:01:38 +0000): Also ip6: [5].

[5] https://s3-logs.fd.io/vex-yul-rot-jenkins-1/csit-vpp-perf-report-iterative-2406-2n-spr/56/log.html.gz#s1-s1-s1-s4-s4-t1-k2-k9-k14

  • vrpolak (Mon, 17 Jun 2024 11:16:41 +0000): Still happens quite frequently in L2 tests, and rarely ip4+e810cq+dpdk [4].

[4] https://s3-logs.fd.io/vex-yul-rot-jenkins-1/csit-vpp-perf-report-iterative-2406-2n-icx/71/log.html.gz#s1-s1-s1-s2-s27-t1-k2-k9-k14

  • vrpolak (Mon, 29 Jan 2024 12:52:26 +0000): And this [3] (ASTF failures with all packets in one direction unsent) is probably the same issue but with TRex in ASTF mode, and it affected all test cases until first STL one, which reset the link.

[3] https://s3-logs.fd.io/vex-yul-rot-jenkins-1/csit-vpp-perf-ndrpdr-weekly-master-2n-spr/45/log.html.gz#s1-s1-s1-s2-s24-t1-k2-k12-k14

  • vrpolak (Tue, 14 Nov 2023 15:52:56 +0000):

    Interestingly, this is also seen [2] to be affecting pure DPDK tests.

Perhaps this and CSIT-1904 have the same cause, just acting on different endpoints?

[2] https://s3-logs.fd.io/vex-yul-rot-jenkins-1/csit-dpdk-perf-report-coverage-2310-2n-spr/1/log.html.gz#s1-s1-s1-s2-t1-k2-k5-k14

  • vrpolak (Wed, 8 Nov 2023 14:52:17 +0000): Rarely, this can happen also in ip4 tests (still seen only on e810cq with dpdk plugin): [1].

[1] https://s3-logs.fd.io/vex-yul-rot-jenkins-1/csit-vpp-perf-report-iterative-2310-3n-icx/37/log.html.gz#s1-s1-s1-s2-s3-t2-k2-k9-k14

Original issue: https://jira.fd.io/browse/CSIT-1936

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions