Commit c9a5d4a
veth: more robust handing of race to avoid txq getting stuck
Commit dc82a33 ("veth: apply qdisc backpressure on full ptr_ring to
reduce TX drops") introduced a race condition that can lead to a permanently
stalled TXQ. This was observed in production on ARM64 systems (Ampere Altra
Max).
The race occurs in veth_xmit(). The producer observes a full ptr_ring and
stops the queue (netif_tx_stop_queue()). The subsequent conditional logic,
intended to re-wake the queue if the consumer had just emptied it (if
(__ptr_ring_empty(...)) netif_tx_wake_queue()), can fail. This leads to a
"lost wakeup" where the TXQ remains stopped (QUEUE_STATE_DRV_XOFF) and
traffic halts.
This failure is caused by an incorrect use of the __ptr_ring_empty() API
from the producer side. As noted in kernel comments, this check is not
guaranteed to be correct if a consumer is operating on another CPU. The
empty test is based on ptr_ring->consumer_head, making it reliable only for
the consumer. Using this check from the producer side is fundamentally racy.
This patch fixes the race by adopting the more robust logic from an earlier
version V4 of the patchset, which always flushed the peer:
(1) In veth_xmit(), the racy conditional wake-up logic and its memory barrier
are removed. Instead, after stopping the queue, we unconditionally call
__veth_xdp_flush(rq). This guarantees that the NAPI consumer is scheduled,
making it solely responsible for re-waking the TXQ.
(2) On the consumer side, the logic for waking the peer TXQ is moved out of
veth_xdp_rcv() and placed at the end of the veth_poll() function. This
placement is part of fixing the race, as the netif_tx_queue_stopped() check
must occur after rx_notify_masked is potentially set to false during NAPI
completion.
This handles the race where veth_poll() consumes all packets and completes
NAPI before veth_xmit() on the producer side has called netif_tx_stop_queue().
In this state, the producer's __veth_xdp_flush(rq) call will see
rx_notify_masked is false and reschedule NAPI. This new NAPI poll, even if it
processes no packets, is now guaranteed to run the netif_tx_queue_stopped()
check, see the stopped queue, and wake it up, allowing veth_xmit() to proceed.
(3) Finally, the NAPI completion check in veth_poll() is updated. If NAPI is
about to complete (napi_complete_done), it now also checks if the peer TXQ
is stopped. If the ring is empty but the peer TXQ is stopped, NAPI will
reschedule itself. This prevents a new race where the producer stops the
queue just as the consumer is finishing its poll, ensuring the wakeup is not
missed.
Fixes: dc82a33 ("veth: apply qdisc backpressure on full ptr_ring to reduce TX drops")
Signed-off-by: Jesper Dangaard Brouer <hawk@kernel.org>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: NipaLocal <nipa@local>1 parent 7b7e05d commit c9a5d4a
1 file changed
+22
-21
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
392 | 392 | | |
393 | 393 | | |
394 | 394 | | |
395 | | - | |
396 | | - | |
397 | | - | |
398 | | - | |
399 | 395 | | |
400 | | - | |
401 | | - | |
402 | | - | |
| 396 | + | |
| 397 | + | |
| 398 | + | |
| 399 | + | |
| 400 | + | |
403 | 401 | | |
404 | 402 | | |
405 | 403 | | |
| |||
900 | 898 | | |
901 | 899 | | |
902 | 900 | | |
903 | | - | |
904 | | - | |
905 | | - | |
906 | | - | |
907 | 901 | | |
908 | 902 | | |
909 | 903 | | |
910 | | - | |
911 | | - | |
912 | | - | |
913 | | - | |
914 | 904 | | |
915 | 905 | | |
916 | 906 | | |
| |||
959 | 949 | | |
960 | 950 | | |
961 | 951 | | |
962 | | - | |
963 | | - | |
964 | | - | |
965 | | - | |
966 | | - | |
967 | 952 | | |
968 | 953 | | |
969 | 954 | | |
970 | 955 | | |
971 | 956 | | |
972 | 957 | | |
973 | 958 | | |
| 959 | + | |
| 960 | + | |
| 961 | + | |
974 | 962 | | |
| 963 | + | |
975 | 964 | | |
976 | 965 | | |
977 | 966 | | |
978 | 967 | | |
979 | 968 | | |
| 969 | + | |
| 970 | + | |
| 971 | + | |
| 972 | + | |
980 | 973 | | |
981 | 974 | | |
982 | 975 | | |
| |||
986 | 979 | | |
987 | 980 | | |
988 | 981 | | |
989 | | - | |
| 982 | + | |
| 983 | + | |
990 | 984 | | |
991 | 985 | | |
992 | 986 | | |
| |||
998 | 992 | | |
999 | 993 | | |
1000 | 994 | | |
| 995 | + | |
| 996 | + | |
| 997 | + | |
| 998 | + | |
| 999 | + | |
| 1000 | + | |
| 1001 | + | |
1001 | 1002 | | |
1002 | 1003 | | |
1003 | 1004 | | |
| |||
0 commit comments