prov/efa: Fix efa_rx_pkts_held counter drift#12247
Open
gjamesli2126 wants to merge 3 commits into
Open
Conversation
sunkuamzn
reviewed
May 15, 2026
Contributor
sunkuamzn
left a comment
There was a problem hiding this comment.
Please check CI failures
dda8ff8 to
1f201cd
Compare
sunkuamzn
reviewed
May 18, 2026
shijin-aws
reviewed
May 20, 2026
shijin-aws
reviewed
May 20, 2026
1f201cd to
731dd31
Compare
shijin-aws
reviewed
May 21, 2026
731dd31 to
548d5fe
Compare
shijin-aws
previously approved these changes
May 21, 2026
Contributor
|
bot:aws:retest |
2 similar comments
Contributor
Author
|
bot:aws:retest |
Contributor
Author
|
bot:aws:retest |
sunkuamzn
reviewed
May 22, 2026
548d5fe to
8724916
Compare
shijin-aws
previously approved these changes
May 22, 2026
jiaxiyan
reviewed
May 22, 2026
Replace open-coded efa_rx_pkts_held++/-- with EFA_RDM_PKE_HELD_BY_PROGRESS: - mark_held() sets the flag and does counter++. - In release_rx(), when it sees the flag, does counter--. - Every hold pairs with exactly one release through a single funnel. Fixes two missing counter++ cases the old scheme missed: - efa_rdm_pke_get_unexp() with rx_copy_unexp=0: rx-pool pkt is parked in the unexp queue (no clone to the unexp pool). - efa_rdm_pke_get_ooo_pke() with rx_copy_ooo=0: rx-pool pkt is parked in the recv window (no clone to the OOO pool). The rx-pool pkt is now occupying a waiting-room slot, so the counter must reflect that. The old code didn't. Signed-off-by: James Li <jameslli@amazon.com>
In efa_rdm_pke_handle_rma_read_completion's local-read branch, NULL txe->local_read_pkt_entry after efa_rdm_pke_handle_data_copied() takes ownership. Also NULL-init the field in txe_construct. Signed-off-by: James Li <jameslli@amazon.com>
The happy-path test simulated post-completion cleanup manually, hiding the behavior of efa_rdm_pke_handle_rma_read_completion. Drive efa_rdm_pke_handle_rma_completion instead, so a regression in the handler would actually fail the test. In the unhappy path, mirror the caller's release of the held rx pkt to keep counter assertions balanced. Signed-off-by: James Li <jameslli@amazon.com>
8724916 to
fb899a2
Compare
sunkuamzn
approved these changes
May 26, 2026
Contributor
Author
|
bot:aws:retest |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Original Issue:
efa_rdm_ep->efa_rx_pkts_held drifted with
FI_EFA_RX_COPY_UNEXP=0/FI_EFA_RX_COPY_OOO=0.Fixed with: Track efa_rx_pkts_held with a per-pkt flag
held++/--throughEFA_RDM_PKE_HELD_BY_PROGRESSflag.held++inpke_get_unexp()(rx_copy_unexp=0).held++inpke_get_ooo_pke()(rx_copy_ooo=0).Fixed with: Release held rx pkt on local-read teardown
Local-read txe holds pkt only via
txe->local_read_pkt_entry; if torn down before read completes, pkt was orphaned.txe_release(): release pkt if non-NULL.txe_construct(): NULL-init.pke_handle_rma_read_completion()success path: NULL the field to avoid double-free.Validation
Debug build on p5en with
FI_PROVIDER=efa FI_EFA_RX_COPY_UNEXP=0 FI_EFA_RX_COPY_OOO=0.Oracle:
assert(held + posted + to_post == rx_pool_size)atefa_rdm_ep_utils.c:963.fi_unexpected_msgfi_rdm_tagged_bwfi_rdm_tagged_bw -D cudaFix double-free in local-read unit test
txe_releasenow frees the held pkt on error path, so the test must not release it again.txe->local_read_pkt_entryafter manual release; teardown'stxe_releasewould otherwise double-free.txe_releasealready handles it).efa_rx_pkts_held/efa_rx_pkts_to_postdeltas + held flag.Out of scope/TODO: queued blocking copy mid-batch leak
flush_queued_blocking_copy_to_hmem()early-return leaks pkts of other still-alive rxes.Naive release double-frees with rxe error path.