Skip to content

[v5.0.x] btl/base: push operation->hdr to am_rdma_respond for queued operation #10529

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 30, 2022

Conversation

wzamazon
Copy link
Contributor

Currently, when calling am_rdma_respond() for a queued
operation, amd_rdma_retry_operation() pass NULL for the hdr argument.

The idea is that hdr is only used for allocating operation->descriptor.
A queued operation should already have a descriptor, therefore does
not need hdr.

This missed the possibility that the allocation of descriptor
in am_rdma_respond() can fail, which will lead to the operation
to be queued without a descriptor.

This patch make retry_operation() to pass operation->hdr to
am_rdma_repsond() to address the issue.

It also added an assertion in am_rdma_repsond() about hdr must
not be NULL before hdr is being used.

Signed-off-by: Wei Zhang wzam@amazon.com
(cherry picked from commit 1758e3d)

Currently, when calling am_rdma_respond() for a queued
operation, amd_rdma_retry_operation() pass NULL for the hdr argument.

The idea is that hdr is only used for allocating operation->descriptor.
A queued operation should already have a descriptor, therefore does
not need hdr.

This missed the possibility that the allocation of descriptor
in am_rdma_respond() can fail, which will lead to the operation
to be queued without a descriptor.

This patch make retry_operation() to pass operation->hdr to
am_rdma_repsond() to address the issue.

It also added an assertion in am_rdma_repsond() about hdr must
not be NULL before hdr is being used.

Signed-off-by: Wei Zhang <wzam@amazon.com>
(cherry picked from commit 1758e3d)
@wzamazon
Copy link
Contributor Author

backport #10463 to v5.0.x branch

@wckzhang wckzhang requested a review from awlauria June 29, 2022 15:20
@wckzhang
Copy link
Contributor

prepend with v5.0.x?

@wzamazon wzamazon changed the title btl/base: push operation->hdr to am_rdma_respond for queued operation [v5.0.x] btl/base: push operation->hdr to am_rdma_respond for queued operation Jun 29, 2022
@wzamazon
Copy link
Contributor Author

prepend with v5.0.x?

I added v5.0.x to PR title, but did not add to commit. because previous back port commits does not have it.

@awlauria
Copy link
Contributor

prepend with v5.0.x?

I added v5.0.x to PR title, but did not add to commit. because previous back port commits does not have it.

Thanks - we actually prefer it this way. It makes diffing the release branch with main easier.

@wzamazon
Copy link
Contributor Author

@awlauria

Is there anything else needed to get it merged?

@awlauria awlauria merged commit 89c274f into open-mpi:v5.0.x Jun 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants