Skip to content

Conversation

@AboorvaDevarajan
Copy link
Member

@AboorvaDevarajan AboorvaDevarajan commented Aug 8, 2022

  • Use the start of the result buffer pointer to unpack the data. (Fixes a data integrity issue)
  • Fix memory leak in the non-contig get_accumulate code path.

Signed-off-by: Aboorva Devarajan abodevar@in.ibm.com

Copy link
Contributor

@devreal devreal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix 👍 One detail though

/* result buffer is not necessarily contiguous. use the opal datatype engine to
* copy the data over in this case */
struct iovec iov = {.iov_base = result, .iov_len = request->len};
struct iovec iov = {.iov_base = result_start, .iov_len = request->len};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If result != NULL then result_start is NULL (per https://github.com/open-mpi/ompi/pull/10634/files#diff-e4bd30eae59fa8aabf9ee87dcb523aa60a292e7c09d1e84e014c1ae7e13e1e3cR281). result_start should always point to the start of the result buffer.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review, updated the PR with the changes, there was also a memory leak here, https://github.com/open-mpi/ompi/pull/10634/files#diff-e4bd30eae59fa8aabf9ee87dcb523aa60a292e7c09d1e84e014c1ae7e13e1e3cR311 which is fixed.

@jsquyres
Copy link
Member

jsquyres commented Aug 8, 2022

When this PR is complete, will it need to be cherry-picked back to v4.1.x and v4.0.x, too?

Use start of the result buffer to unpack the data.

Signed-off-by: Aboorva Devarajan <abodevar@in.ibm.com>
@AboorvaDevarajan
Copy link
Member Author

When this PR is complete, will it need to be cherry-picked back to v4.1.x and v4.0.x, too?

Jeff, this test passes in v4.0.x (with osc rdma) and the issue seems to be introduced in v4.1.x so we can back-port this fix to v4.1.x and v5.0.x branches once merged.

@devreal devreal merged commit b9e4c41 into open-mpi:main Aug 11, 2022
@devreal
Copy link
Contributor

devreal commented Aug 11, 2022

@AboorvaDevarajan Can you please backport this fix to the release branches?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants