Skip to content

btl/base: push operation->hdr to am_rdma_respond for queued operation #10463

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 28, 2022

Conversation

wzamazon
Copy link
Contributor

Currently, when calling am_rdma_respond() for a queued
operation, amd_rdma_retry_operation() pass NULL for the hdr argument.

The idea is that hdr is only used for allocating operation->descriptor.
A queued operation should already have a descriptor, therefore does
not need hdr.

This missed the possibility that the allocation of descriptor
in am_rdma_respond() can fail, which will lead to the operation
to be queued without a descriptor.

This patch make retry_operation() to pass operation->hdr to
am_rdma_repsond() to address the issue.

Signed-off-by: Wei Zhang wzam@amazon.com

@wzamazon
Copy link
Contributor Author

@bwbarrett @hjelmn Can you review this PR? this fixed some segfault in mtt ibm/onesided tests.

@wzamazon
Copy link
Contributor Author

Can I get a review for this PR? Thanks!

Copy link
Member

@bwbarrett bwbarrett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change makes sense. But I think you should also add an assert in the if (NULL == send_descriptor) section of am_rdma_respond asserting hdr is not NULL.

@wzamazon wzamazon force-pushed the btl_base_fix_queue_op branch from d057959 to 4575ac4 Compare June 25, 2022 19:23
Currently, when calling am_rdma_respond() for a queued
operation, amd_rdma_retry_operation() pass NULL for the hdr argument.

The idea is that hdr is only used for allocating operation->descriptor.
A queued operation should already have a descriptor, therefore does
not need hdr.

This missed the possibility that the allocation of descriptor
in am_rdma_respond() can fail, which will lead to the operation
to be queued without a descriptor.

This patch make retry_operation() to pass operation->hdr to
am_rdma_repsond() to address the issue.

It also added an assertion in am_rdma_repsond() about hdr must
not be NULL before hdr is being used.

Signed-off-by: Wei Zhang <wzam@amazon.com>
@wzamazon wzamazon force-pushed the btl_base_fix_queue_op branch from 4575ac4 to 1758e3d Compare June 25, 2022 19:53
@wzamazon
Copy link
Contributor Author

@bwbarrett

Thank you for the review! I added the assertion in new revision.

@wzamazon wzamazon requested a review from bwbarrett June 25, 2022 19:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants