Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

netdev CI testing #6666

Open
wants to merge 1,523 commits into
base: bpf-next_base
Choose a base branch
from
Open

Conversation

kuba-moo
Copy link
Contributor

Reusable PR for hooking netdev CI to BPF testing.

Jitendra Vegiraju and others added 27 commits October 19, 2024 23:01
The BCM8958x uses early adopter version of DWC_xgmac version 4.00a for
Ethernet MAC. The DW25GMAC introduced in this version adds new DMA
architecture called Hyper-DMA (HDMA) for virtualization scalability.
This is realized by decoupling physical DMA channels(PDMA) from potentially
large number of virtual DMA channels (VDMA). The VDMAs are software
abstractions that map to PDMAs for frame transmission and reception.

Define new macros DW25GMAC_CORE_4_00 and DW25GMAC_ID to identify DW25GMAC
device.
To support the new HDMA architecture, a new instance of stmmac_dma_ops
dw25gmac400_dma_ops is added.
To support the current needs, a simple one-to-one mapping of dw25gmac's
logical VDMA (channel) to TC to PDMAs is used.
Most of the other dma operation functions in existing dwxgamc2_dma.c file
are reused where applicable.
Added setup function for DW25GMAC's stmmac_hwif_entry in stmmac core.

Signed-off-by: Jitendra Vegiraju <jitendra.vegiraju@broadcom.com>
Signed-off-by: NipaLocal <nipa@local>
Integrate dw25gmac support into stmmac hardware interface handling.
Added a new entry to the stmmac_hw table in hwif.c.
Since BCM8958x is an early adopter device, the synopsis_id reported in HW
is 0x32 and device_id is DWXGMAC_ID. Provide override support by giving
preference to snps_id, dev_id values when initialized to non-zero
values by glue driver.

Signed-off-by: Jitendra Vegiraju <jitendra.vegiraju@broadcom.com>
Signed-off-by: NipaLocal <nipa@local>
Add PCI ethernet driver support for Broadcom BCM8958x SoC devices used
in automotive applications.

This SoC device has PCIe ethernet MAC attached to an integrated ethernet
switch using XGMII interface. The PCIe ethernet controller is presented to
the Linux host as PCI network device.

The following block diagram gives an overview of the application.
             +=================================+
             |       Host CPU/Linux            |
             +=================================+
                        || PCIe
                        ||
        +==========================================+
        |           +--------------+               |
        |           | PCIE Endpoint|               |
        |           | Ethernet     |               |
        |           | Controller   |               |
        |           |   DMA        |               |
        |           +--------------+               |
        |           |   MAC        |   BCM8958X    |
        |           +--------------+   SoC         |
        |               || XGMII                   |
        |               ||                         |
        |           +--------------+               |
        |           | Ethernet     |               |
        |           | switch       |               |
        |           +--------------+               |
        |             || || || ||                  |
        +==========================================+
                      || || || || More external interfaces

The MAC IP block on BCM8958x is based on Synopsis XGMAC 4.00a core. This
driver uses common dwxgmac2 code where applicable.
Driver functionality specific to this MAC is implemented in dw25gmac.c.

Management of integrated ethernet switch on this SoC is not handled by
the PCIe interface.
Since BCM8958x is an early adopter device, override the hardware reported
synopsis versions with actual DW25MAC versions that support hdma.

This SoC device has PCIe ethernet MAC directly attached to an integrated
ethernet switch using XGMII interface. Since device tree support is not
available on this platform, a software node is created to enable
fixed-link support using phylink driver.

Signed-off-by: Jitendra Vegiraju <jitendra.vegiraju@broadcom.com>
Signed-off-by: NipaLocal <nipa@local>
Add PCI driver for BCM8958x to the linux build system and
update MAINTAINERS file.

Signed-off-by: Jitendra Vegiraju <jitendra.vegiraju@broadcom.com>
Signed-off-by: NipaLocal <nipa@local>
Add Fibocom FG132 0x0112 composition:

T:  Bus=03 Lev=02 Prnt=06 Port=01 Cnt=02 Dev#= 10 Spd=12   MxCh= 0
D:  Ver= 2.01 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs=  1
P:  Vendor=2cb7 ProdID=0112 Rev= 5.15
S:  Manufacturer=Fibocom Wireless Inc.
S:  Product=Fibocom Module
S:  SerialNumber=xxxxxxxx
C:* #Ifs= 4 Cfg#= 1 Atr=a0 MxPwr=500mA
I:* If#= 0 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=50 Driver=qmi_wwan
E:  Ad=82(I) Atr=03(Int.) MxPS=   8 Ivl=32ms
E:  Ad=81(I) Atr=02(Bulk) MxPS=  64 Ivl=0ms
E:  Ad=01(O) Atr=02(Bulk) MxPS=  64 Ivl=0ms
I:* If#= 1 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=30 Driver=option
E:  Ad=02(O) Atr=02(Bulk) MxPS=  64 Ivl=0ms
E:  Ad=83(I) Atr=02(Bulk) MxPS=  64 Ivl=0ms
I:* If#= 2 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=40 Driver=option
E:  Ad=85(I) Atr=03(Int.) MxPS=  10 Ivl=32ms
E:  Ad=84(I) Atr=02(Bulk) MxPS=  64 Ivl=0ms
E:  Ad=03(O) Atr=02(Bulk) MxPS=  64 Ivl=0ms
I:* If#= 3 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=00 Prot=00 Driver=option
E:  Ad=86(I) Atr=02(Bulk) MxPS=  64 Ivl=0ms
E:  Ad=04(O) Atr=02(Bulk) MxPS=  64 Ivl=0ms

Signed-off-by: Reinhard Speyerer <rspmn@arcor.de>
Signed-off-by: NipaLocal <nipa@local>
Currently we disable PCS ANE when the link speed is 2.5Gbps.
mac_link_up callback internally calls the fix_mac_speed which internally
calls stmmac_pcs_ctrl_ane to disable the ANE for 2.5Gbps.

We observed that the CPU utilization is pretty high. That is because
we saw that the PCS interrupt status line for Link and AN always remain
asserted. Since we are disabling the PCS ANE for 2.5Gbps it makes sense
to also disable the PCS link status and AN complete in the interrupt
enable register.

Interrupt storm Issue:-
[   25.465754][    C2] stmmac_pcs: Link Down
[   25.469888][    C2] stmmac_pcs: Link Down
[   25.474030][    C2] stmmac_pcs: Link Down
[   25.478164][    C2] stmmac_pcs: Link Down
[   25.482305][    C2] stmmac_pcs: Link Down
[   25.486441][    C2] stmmac_pcs: Link Down
[   25.486635][    C4] watchdog0: pretimeout event
[   25.490585][    C2] stmmac_pcs: Link Down
[   25.499341][    C2] stmmac_pcs: Link Down
[   25.503484][    C2] stmmac_pcs: Link Down
[   25.507619][    C2] stmmac_pcs: Link Down
[   25.511760][    C2] stmmac_pcs: Link Down
[   25.515897][    C2] stmmac_pcs: Link Down
[   25.520038][    C2] stmmac_pcs: Link Down
[   25.524174][    C2] stmmac_pcs: Link Down
[   25.528316][    C2] stmmac_pcs: Link Down
[   25.532451][    C2] stmmac_pcs: Link Down
[   25.536591][    C2] stmmac_pcs: Link Down
[   25.540724][    C2] stmmac_pcs: Link Down
[   25.544866][    C2] stmmac_pcs: Link Down

Once we disabled PCS ANE and Link Status interrupt issue
disappears.

Fixes: a818bd1 ("net: stmmac: dwmac-qcom-ethqos: Add support for 2.5G SGMII")
Signed-off-by: Abhishek Chauhan <quic_abchauha@quicinc.com>
Signed-off-by: NipaLocal <nipa@local>
DualPI2 provides L4S-type low latency & loss to traffic that uses a
scalable congestion controller (e.g. TCP-Prague, DCTCP) without
degrading the performance of 'classic' traffic (e.g. Reno,
Cubic etc.). It is intended to be the reference implementation of the
IETF's DualQ Coupled AQM.

The qdisc provides two queues called low latency and classic. It
classifies packets based on the ECN field in the IP headers. By
default it directs non-ECN and ECT(0) into the classic queue and
ECT(1) and CE into the low latency queue, as per the IETF spec.

Each queue runs its own AQM:
* The classic AQM is called PI2, which is similar to the PIE AQM but
  more responsive and simpler. Classic traffic requires a decent
  target queue (default 15ms for Internet deployment) to fully
  utilize the link and to avoid high drop rates.
* The low latency AQM is, by default, a very shallow ECN marking
  threshold (1ms) similar to that used for DCTCP.

The DualQ isolates the low queuing delay of the Low Latency queue
from the larger delay of the 'Classic' queue. However, from a
bandwidth perspective, flows in either queue will share out the link
capacity as if there was just a single queue. This bandwidth pooling
effect is achieved by coupling together the drop and ECN-marking
probabilities of the two AQMs.

The PI2 AQM has two main parameters in addition to its target delay.
All the defaults are suitable for any Internet setting, but it can
be reconfigured for a Data Centre setting. The integral gain factor
alpha is used to slowly correct any persistent standing queue error
from the target delay, while the proportional gain factor beta is
used to quickly compensate for queue changes (growth or shrinkage).
Either alpha and beta are given as a parameter, or they can be
calculated by tc from alternative typical and maximum RTT parameters.

Internally, the output of a linear Proportional Integral (PI)
controller is used for both queues. This output is squared to
calculate the drop or ECN-marking probability of the classic queue.
This counterbalances the square-root rate equation of Reno/Cubic,
which is the trick that balances flow rates across the queues. For
the ECN-marking probability of the low latency queue, the output of
the base AQM is multiplied by a coupling factor. This determines the
balance between the flow rates in each queue. The default setting
makes the flow rates roughly equal, which should be generally
applicable.

If DUALPI2 AQM has detected overload (due to excessive non-responsive
traffic in either queue), it will switch to signaling congestion
solely using drop, irrespective of the ECN field. Alternatively, it
can be configured to limit the drop probability and let the queue
grow and eventually overflow (like tail-drop).

Additional details can be found in the draft:
  https://datatracker.ietf.org/doc/html/rfc9332

Signed-off-by: Koen De Schepper <koen.de_schepper@nokia-bell-labs.com>
Co-developed-by: Olga Albisser <olga@albisser.org>
Signed-off-by: Olga Albisser <olga@albisser.org>
Co-developed-by: Olivier Tilmans <olivier.tilmans@nokia.com>
Signed-off-by: Olivier Tilmans <olivier.tilmans@nokia.com>
Co-developed-by: Henrik Steen <henrist@henrist.net>
Signed-off-by: Henrik Steen <henrist@henrist.net>
Signed-off-by: Bob Briscoe <research@bobbriscoe.net>
Signed-off-by: Ilpo Järvinen <ij@kernel.org>
Co-developed-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
Signed-off-by: NipaLocal <nipa@local>
- Move tcp_count_delivered() earlier and split tcp_count_delivered_ce()
  out of it
- Move tcp_in_ack_event() later
- While at it, remove the inline from tcp_in_ack_event() and let
  the compiler to decide

Accurate ECN's heuristics does not know if there is going
to be ACE field based CE counter increase or not until after
rtx queue has been processed. Only then the number of ACKed
bytes/pkts is available. As CE or not affects presence of
FLAG_ECE, that information for tcp_in_ack_event is not yet
available in the old location of the call to tcp_in_ack_event().

Signed-off-by: Ilpo Järvinen <ij@kernel.org>
Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
Signed-off-by: NipaLocal <nipa@local>
Whenever timestamp advances, it declares progress which
can be used by the other parts of the stack to decide that
the ACK is the most recent one seen so far.

AccECN will use this flag when deciding whether to use the
ACK to update AccECN state or not.

Signed-off-by: Ilpo Järvinen <ij@kernel.org>
Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
Signed-off-by: NipaLocal <nipa@local>
Use BIT() macro for TCP flags field and TCP congestion control
flags that will be used by the congestion control algorithm.

No functional changes.

Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
Reviewed-by: Ilpo Järvinen <ij@kernel.org>
Signed-off-by: NipaLocal <nipa@local>
With AccECN, there's one additional TCP flag to be used (AE)
and ACE field that overloads the definition of AE, CWR, and
ECE flags. As tcp_flags was previously only 1 byte, the
byte-order stuff needs to be added to it's handling.

Signed-off-by: Ilpo Järvinen <ij@kernel.org>
Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
Signed-off-by: NipaLocal <nipa@local>
Prepare for AccECN that needs to have access here on IP ECN
field value which is only available after INET_ECN_xmit().

No functional changes.

Signed-off-by: Ilpo Järvinen <ij@kernel.org>
Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
Signed-off-by: NipaLocal <nipa@local>
Rename tcp_ecn_check_ce to tcp_data_ecn_check as it is
called only for data segments, not for ACKs (with AccECN,
also ACKs may get ECN bits).

The extra "layer" in tcp_ecn_check_ce() function just
checks for ECN being enabled, that can be moved into
tcp_ecn_field_check rather than having the __ variant.

No functional changes.

Signed-off-by: Ilpo Järvinen <ij@kernel.org>
Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
Signed-off-by: NipaLocal <nipa@local>
Create helpers for TCP ECN modes. No functional changes.

Signed-off-by: Ilpo Järvinen <ij@kernel.org>
Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
Signed-off-by: NipaLocal <nipa@local>
Handling the CWR flag differs between RFC 3168 ECN and AccECN.
With RFC 3168 ECN aware TSO (NETIF_F_TSO_ECN) CWR flag is cleared
starting from 2nd segment which is incompatible how AccECN handles
the CWR flag. Such super-segments are indicated by SKB_GSO_TCP_ECN.
With AccECN, CWR flag (or more accurately, the ACE field that also
includes ECE & AE flags) changes only when new packet(s) with CE
mark arrives so the flag should not be changed within a super-skb.
The new skb/feature flags are necessary to prevent such TSO engines
corrupting AccECN ACE counters by clearing the CWR flag (if the
CWR handling feature cannot be turned off).

If NIC is completely unaware of RFC3168 ECN (doesn't support
NETIF_F_TSO_ECN) or its TSO engine can be set to not touch CWR flag
despite supporting also NETIF_F_TSO_ECN, TSO could be safely used
with AccECN on such NIC. This should be evaluated per NIC basis
(not done in this patch series for any NICs).

For the cases, where TSO cannot keep its hands off the CWR flag,
a GSO fallback is provided by this patch.

Signed-off-by: Ilpo Järvinen <ij@kernel.org>
Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
Signed-off-by: NipaLocal <nipa@local>
There are important differences in how the CWR field behaves
in RFC3168 and AccECN. With AccECN, CWR flag is part of the
ACE counter and its changes are important so adjust the flags
changed mask accordingly.

Also, if CWR is there, set the Accurate ECN GSO flag to avoid
corrupting CWR flag somewhere.

Signed-off-by: Ilpo Järvinen <ij@kernel.org>
Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
Signed-off-by: NipaLocal <nipa@local>
AE flag needs to be preserved for AccECN.

Signed-off-by: Ilpo Järvinen <ij@kernel.org>
Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
Signed-off-by: NipaLocal <nipa@local>
AccECN connection's last ACK cannot retain ECT(1) as the bits
are always cleared causing the packet to switch into another
service queue.

This effectively adds a finer-grained filtering for ECN bits
so that acceptable TW ACKs can retain the bits.

Signed-off-by: Ilpo Järvinen <ij@kernel.org>
Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
Signed-off-by: NipaLocal <nipa@local>
Accurate ECN needs to send custom flags to handle IP-ECN
field reflection during handshake.

Signed-off-by: Ilpo Järvinen <ij@kernel.org>
Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
Signed-off-by: NipaLocal <nipa@local>
The following patch will use tcp_ecn_mode_accecn(),
TCP_ACCECN_CEP_INIT_OFFSET, TCP_ACCECN_CEP_ACE_MASK in
__tcp_fast_path_on() to make new flag for AccECN.

No functional changes.

Signed-off-by: Ilpo Järvinen <ij@kernel.org>
Signed-off-by: Chai-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
Signed-off-by: NipaLocal <nipa@local>
Add SYSCTL_FIVE for new AccECN feedback modes of net.ipv4.tcp_ecn.

Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
Signed-off-by: NipaLocal <nipa@local>
KMSAN detects uninitialized memory stored to memory by
bpf_clone_redirect(). Adding a check to the transmission path to find
malformed headers prevents this issue. Specifically, we check if the length
of the data stored in skb is less than the minimum device header length.
If so, drop the packet since the skb cannot contain a valid device header.
Also check if mac_header_len(skb) is outside the range provided of valid
device header lengths.

Testing this patch with syzbot removes the bug.

Fixes: 8826498 ("Merge tag 'sched_ext-for-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext")
Reported-by: syzbot+346474e3bf0b26bd3090@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=346474e3bf0b26bd3090
Signed-off-by: Daniel Yang <danielyangkang@gmail.com>
Signed-off-by: NipaLocal <nipa@local>
'ports_fwnode' is initialized via device_get_named_child_node(), which
requires a call to fwnode_handle_put() when the variable is no longer
required to avoid leaking memory.

Add the missing fwnode_handle_put() after 'ports_fwnode' has been used
and is no longer required.

Fixes: 94a2a84 ("net: dsa: mv88e6xxx: Support LED control")
Signed-off-by: Javier Carrasco <javier.carrasco.cruz@gmail.com>
Signed-off-by: NipaLocal <nipa@local>
tc_actions.sh keeps hanging the forwarding tests.

sdf@: tdc & tdc-dbg started intermittenly failing around Sep 25th

Signed-off-by: NipaLocal <nipa@local>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: NipaLocal <nipa@local>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: NipaLocal <nipa@local>
The buffers can easily get large and fail allocations. One of the
networking tests running in a VM tries to run perf which occasionally
ends with:

  perf: page allocation failure: order:6, mode:0x40dc0(GFP_KERNEL|__GFP_COMP|__GFP_ZERO), nodemask=(null),cpuset=/,mems_allowed=0

Even tho free memory is available:

  free:97464 free_pcp:321 free_cma:0

Switch to kvzalloc() to make large allocations less likely to fail.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: NipaLocal <nipa@local>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.