Commit 7b7e05d
veth: enable dev_watchdog for detecting stalled TXQs
The changes introduced in commit dc82a33 ("veth: apply qdisc
backpressure on full ptr_ring to reduce TX drops") have been found to cause
a race condition in production environments.
Under specific circumstances, observed exclusively on ARM64 (aarch64)
systems with Ampere Altra Max CPUs, a transmit queue (TXQ) can become
permanently stalled. This happens when the race condition leads to the TXQ
entering the QUEUE_STATE_DRV_XOFF state without a corresponding queue wake-up,
preventing the attached qdisc from dequeueing packets and causing the
network link to halt.
As a first step towards resolving this issue, this patch introduces a
failsafe mechanism. It enables the net device watchdog by setting a timeout
value and implements the .ndo_tx_timeout callback.
If a TXQ stalls, the watchdog will trigger the veth_tx_timeout() function,
which logs a warning and calls netif_tx_wake_queue() to unstall the queue
and allow traffic to resume.
The log message will look like this:
veth42: NETDEV WATCHDOG: CPU: 34: transmit queue 0 timed out 5393 ms
veth42: veth backpressure stalled(n:1) TXQ(0) re-enable
This provides a necessary recovery mechanism while the underlying race
condition is investigated further. Subsequent patches will address the root
cause and add more robust state handling.
Fixes: dc82a33 ("veth: apply qdisc backpressure on full ptr_ring to reduce TX drops")
Signed-off-by: Jesper Dangaard Brouer <hawk@kernel.org>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: NipaLocal <nipa@local>1 parent abe6f6d commit 7b7e05d
1 file changed
+15
-1
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
959 | 959 | | |
960 | 960 | | |
961 | 961 | | |
962 | | - | |
| 962 | + | |
| 963 | + | |
963 | 964 | | |
| 965 | + | |
964 | 966 | | |
965 | 967 | | |
966 | 968 | | |
| |||
1373 | 1375 | | |
1374 | 1376 | | |
1375 | 1377 | | |
| 1378 | + | |
| 1379 | + | |
| 1380 | + | |
| 1381 | + | |
| 1382 | + | |
| 1383 | + | |
| 1384 | + | |
| 1385 | + | |
| 1386 | + | |
| 1387 | + | |
1376 | 1388 | | |
1377 | 1389 | | |
1378 | 1390 | | |
| |||
1711 | 1723 | | |
1712 | 1724 | | |
1713 | 1725 | | |
| 1726 | + | |
1714 | 1727 | | |
1715 | 1728 | | |
1716 | 1729 | | |
| |||
1749 | 1762 | | |
1750 | 1763 | | |
1751 | 1764 | | |
| 1765 | + | |
1752 | 1766 | | |
1753 | 1767 | | |
1754 | 1768 | | |
| |||
0 commit comments