Skip to content

Commit 9496f2a

Browse files
jacob-kelleranguy11
authored andcommitted
ice: fix Rx page leak on multi-buffer frames
The ice_put_rx_mbuf() function handles calling ice_put_rx_buf() for each buffer in the current frame. This function was introduced as part of handling multi-buffer XDP support in the ice driver. It works by iterating over the buffers from first_desc up to 1 plus the total number of fragments in the frame, cached from before the XDP program was executed. If the hardware posts a descriptor with a size of 0, the logic used in ice_put_rx_mbuf() breaks. Such descriptors get skipped and don't get added as fragments in ice_add_xdp_frag. Since the buffer isn't counted as a fragment, we do not iterate over it in ice_put_rx_mbuf(), and thus we don't call ice_put_rx_buf(). Because we don't call ice_put_rx_buf(), we don't attempt to re-use the page or free it. This leaves a stale page in the ring, as we don't increment next_to_alloc. The ice_reuse_rx_page() assumes that the next_to_alloc has been incremented properly, and that it always points to a buffer with a NULL page. Since this function doesn't check, it will happily recycle a page over the top of the next_to_alloc buffer, losing track of the old page. Note that this leak only occurs for multi-buffer frames. The ice_put_rx_mbuf() function always handles at least one buffer, so a single-buffer frame will always get handled correctly. It is not clear precisely why the hardware hands us descriptors with a size of 0 sometimes, but it happens somewhat regularly with "jumbo frames" used by 9K MTU. To fix ice_put_rx_mbuf(), we need to make sure to call ice_put_rx_buf() on all buffers between first_desc and next_to_clean. Borrow the logic of a similar function in i40e used for this same purpose. Use the same logic also in ice_get_pgcnts(). Instead of iterating over just the number of fragments, use a loop which iterates until the current index reaches to the next_to_clean element just past the current frame. Check the current number of fragments (post XDP program). For all buffers up 1 more than the number of fragments, we'll update the pagecnt_bias. For any buffers past this, pagecnt_bias is left as-is. This ensures that fragments released by the XDP program, as well as any buffers with zero-size won't have their pagecnt_bias updated incorrectly. Unlike i40e, the ice_put_rx_mbuf() function does call ice_put_rx_buf() on the last buffer of the frame indicating end of packet. The xdp_xmit value only needs to be updated if an XDP program is run, and only once per packet. Drop the xdp_xmit pointer argument from ice_put_rx_mbuf(). Instead, set xdp_xmit in the ice_clean_rx_irq() function directly. This avoids needing to pass the argument and avoids an extra bit-wise OR for each buffer in the frame. Move the increment of the ntc local variable to ensure its updated *before* all calls to ice_get_pgcnts() or ice_put_rx_mbuf(), as the loop logic requires the index of the element just after the current frame. This has the advantage that we also no longer need to track or cache the number of fragments in the rx_ring, which saves a few bytes in the ring. Cc: Christoph Petrausch <christoph.petrausch@deepl.com> Reported-by: Jaroslav Pulchart <jaroslav.pulchart@gooddata.com> Closes: https://lore.kernel.org/netdev/CAK8fFZ4hY6GUJNENz3wY9jaYLZXGfpr7dnZxzGMYoE44caRbgw@mail.gmail.com/ Fixes: 743bbd9 ("ice: put Rx buffers after being done with current frame") Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
1 parent 467d3a9 commit 9496f2a

File tree

2 files changed

+33
-49
lines changed

2 files changed

+33
-49
lines changed

drivers/net/ethernet/intel/ice/ice_txrx.c

Lines changed: 33 additions & 48 deletions
Original file line numberDiff line numberDiff line change
@@ -894,10 +894,6 @@ ice_add_xdp_frag(struct ice_rx_ring *rx_ring, struct xdp_buff *xdp,
894894
__skb_fill_page_desc_noacc(sinfo, sinfo->nr_frags++, rx_buf->page,
895895
rx_buf->page_offset, size);
896896
sinfo->xdp_frags_size += size;
897-
/* remember frag count before XDP prog execution; bpf_xdp_adjust_tail()
898-
* can pop off frags but driver has to handle it on its own
899-
*/
900-
rx_ring->nr_frags = sinfo->nr_frags;
901897

902898
if (page_is_pfmemalloc(rx_buf->page))
903899
xdp_buff_set_frag_pfmemalloc(xdp);
@@ -968,20 +964,20 @@ ice_get_rx_buf(struct ice_rx_ring *rx_ring, const unsigned int size,
968964
/**
969965
* ice_get_pgcnts - grab page_count() for gathered fragments
970966
* @rx_ring: Rx descriptor ring to store the page counts on
967+
* @ntc: the next to clean element (not included in this frame!)
971968
*
972969
* This function is intended to be called right before running XDP
973970
* program so that the page recycling mechanism will be able to take
974971
* a correct decision regarding underlying pages; this is done in such
975972
* way as XDP program can change the refcount of page
976973
*/
977-
static void ice_get_pgcnts(struct ice_rx_ring *rx_ring)
974+
static void ice_get_pgcnts(struct ice_rx_ring *rx_ring, unsigned int ntc)
978975
{
979-
u32 nr_frags = rx_ring->nr_frags + 1;
980976
u32 idx = rx_ring->first_desc;
981977
struct ice_rx_buf *rx_buf;
982978
u32 cnt = rx_ring->count;
983979

984-
for (int i = 0; i < nr_frags; i++) {
980+
while (idx != ntc) {
985981
rx_buf = &rx_ring->rx_buf[idx];
986982
rx_buf->pgcnt = page_count(rx_buf->page);
987983

@@ -1154,62 +1150,48 @@ ice_put_rx_buf(struct ice_rx_ring *rx_ring, struct ice_rx_buf *rx_buf)
11541150
}
11551151

11561152
/**
1157-
* ice_put_rx_mbuf - ice_put_rx_buf() caller, for all frame frags
1153+
* ice_put_rx_mbuf - ice_put_rx_buf() caller, for all buffers in frame
11581154
* @rx_ring: Rx ring with all the auxiliary data
11591155
* @xdp: XDP buffer carrying linear + frags part
1160-
* @xdp_xmit: XDP_TX/XDP_REDIRECT verdict storage
1161-
* @ntc: a current next_to_clean value to be stored at rx_ring
1156+
* @ntc: the next to clean element (not included in this frame!)
11621157
* @verdict: return code from XDP program execution
11631158
*
1164-
* Walk through gathered fragments and satisfy internal page
1165-
* recycle mechanism; we take here an action related to verdict
1166-
* returned by XDP program;
1159+
* Called after XDP program is completed, or on error with verdict set to
1160+
* ICE_XDP_CONSUMED.
1161+
*
1162+
* Walk through buffers from first_desc to the end of the frame, releasing
1163+
* buffers and satisfying internal page recycle mechanism. The action depends
1164+
* on verdict from XDP program.
11671165
*/
11681166
static void ice_put_rx_mbuf(struct ice_rx_ring *rx_ring, struct xdp_buff *xdp,
1169-
u32 *xdp_xmit, u32 ntc, u32 verdict)
1167+
u32 ntc, u32 verdict)
11701168
{
1171-
u32 nr_frags = rx_ring->nr_frags + 1;
1169+
u32 nr_frags = xdp_get_shared_info_from_buff(xdp)->nr_frags;
11721170
u32 idx = rx_ring->first_desc;
11731171
u32 cnt = rx_ring->count;
1174-
u32 post_xdp_frags = 1;
11751172
struct ice_rx_buf *buf;
1176-
int i;
1177-
1178-
if (unlikely(xdp_buff_has_frags(xdp)))
1179-
post_xdp_frags += xdp_get_shared_info_from_buff(xdp)->nr_frags;
1173+
int i = 0;
11801174

1181-
for (i = 0; i < post_xdp_frags; i++) {
1175+
while (idx != ntc) {
11821176
buf = &rx_ring->rx_buf[idx];
1177+
if (++idx == cnt)
1178+
idx = 0;
11831179

1184-
if (verdict & (ICE_XDP_TX | ICE_XDP_REDIR)) {
1180+
/* An XDP program could release fragments from the end of the
1181+
* buffer. For these, we need to keep the pagecnt_bias as-is.
1182+
* To do this, only adjust pagecnt_bias for fragments up to
1183+
* the total remaining after the XDP program has run.
1184+
*/
1185+
if (verdict != ICE_XDP_CONSUMED)
11851186
ice_rx_buf_adjust_pg_offset(buf, xdp->frame_sz);
1186-
*xdp_xmit |= verdict;
1187-
} else if (verdict & ICE_XDP_CONSUMED) {
1187+
else if (i++ <= nr_frags)
11881188
buf->pagecnt_bias++;
1189-
} else if (verdict == ICE_XDP_PASS) {
1190-
ice_rx_buf_adjust_pg_offset(buf, xdp->frame_sz);
1191-
}
11921189

11931190
ice_put_rx_buf(rx_ring, buf);
1194-
1195-
if (++idx == cnt)
1196-
idx = 0;
1197-
}
1198-
/* handle buffers that represented frags released by XDP prog;
1199-
* for these we keep pagecnt_bias as-is; refcount from struct page
1200-
* has been decremented within XDP prog and we do not have to increase
1201-
* the biased refcnt
1202-
*/
1203-
for (; i < nr_frags; i++) {
1204-
buf = &rx_ring->rx_buf[idx];
1205-
ice_put_rx_buf(rx_ring, buf);
1206-
if (++idx == cnt)
1207-
idx = 0;
12081191
}
12091192

12101193
xdp->data = NULL;
12111194
rx_ring->first_desc = ntc;
1212-
rx_ring->nr_frags = 0;
12131195
}
12141196

12151197
/**
@@ -1317,6 +1299,10 @@ static int ice_clean_rx_irq(struct ice_rx_ring *rx_ring, int budget)
13171299
/* retrieve a buffer from the ring */
13181300
rx_buf = ice_get_rx_buf(rx_ring, size, ntc);
13191301

1302+
/* Increment ntc before calls to ice_put_rx_mbuf() */
1303+
if (++ntc == cnt)
1304+
ntc = 0;
1305+
13201306
if (!xdp->data) {
13211307
void *hard_start;
13221308

@@ -1325,24 +1311,23 @@ static int ice_clean_rx_irq(struct ice_rx_ring *rx_ring, int budget)
13251311
xdp_prepare_buff(xdp, hard_start, offset, size, !!offset);
13261312
xdp_buff_clear_frags_flag(xdp);
13271313
} else if (ice_add_xdp_frag(rx_ring, xdp, rx_buf, size)) {
1328-
ice_put_rx_mbuf(rx_ring, xdp, NULL, ntc, ICE_XDP_CONSUMED);
1314+
ice_put_rx_mbuf(rx_ring, xdp, ntc, ICE_XDP_CONSUMED);
13291315
break;
13301316
}
1331-
if (++ntc == cnt)
1332-
ntc = 0;
13331317

13341318
/* skip if it is NOP desc */
13351319
if (ice_is_non_eop(rx_ring, rx_desc))
13361320
continue;
13371321

1338-
ice_get_pgcnts(rx_ring);
1322+
ice_get_pgcnts(rx_ring, ntc);
13391323
xdp_verdict = ice_run_xdp(rx_ring, xdp, xdp_prog, xdp_ring, rx_desc);
13401324
if (xdp_verdict == ICE_XDP_PASS)
13411325
goto construct_skb;
13421326
total_rx_bytes += xdp_get_buff_len(xdp);
13431327
total_rx_pkts++;
13441328

1345-
ice_put_rx_mbuf(rx_ring, xdp, &xdp_xmit, ntc, xdp_verdict);
1329+
ice_put_rx_mbuf(rx_ring, xdp, ntc, xdp_verdict);
1330+
xdp_xmit |= xdp_verdict & (ICE_XDP_TX | ICE_XDP_REDIR);
13461331

13471332
continue;
13481333
construct_skb:
@@ -1355,7 +1340,7 @@ static int ice_clean_rx_irq(struct ice_rx_ring *rx_ring, int budget)
13551340
rx_ring->ring_stats->rx_stats.alloc_page_failed++;
13561341
xdp_verdict = ICE_XDP_CONSUMED;
13571342
}
1358-
ice_put_rx_mbuf(rx_ring, xdp, &xdp_xmit, ntc, xdp_verdict);
1343+
ice_put_rx_mbuf(rx_ring, xdp, ntc, xdp_verdict);
13591344

13601345
if (!skb)
13611346
break;

drivers/net/ethernet/intel/ice/ice_txrx.h

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -358,7 +358,6 @@ struct ice_rx_ring {
358358
struct ice_tx_ring *xdp_ring;
359359
struct ice_rx_ring *next; /* pointer to next ring in q_vector */
360360
struct xsk_buff_pool *xsk_pool;
361-
u32 nr_frags;
362361
u16 max_frame;
363362
u16 rx_buf_len;
364363
dma_addr_t dma; /* physical address of ring */

0 commit comments

Comments
 (0)