-
Notifications
You must be signed in to change notification settings - Fork 931
osc/rdma: fix bug in MPI_OP_NO_OP support #6301
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
osc/rdma: fix bug in MPI_OP_NO_OP support #6301
Conversation
|
:bot:retest:ompi: |
|
@hjelmn I pushed a trivial fix to your commit. Feel free to squash it into yours. |
|
bot:lanl:retest |
|
bot:ompi:retest |
7af907b to
cad89f8
Compare
cad89f8 to
1aeb91b
Compare
|
@ggouaillardet Cleaned up things a bit and handled some other problems found when attempting to use ARMCI on Crays. Please test this version. |
|
The IBM CI (GNU/Scale) build failed! Please review the log, linked below. Gist: https://gist.github.com/81a4fcac04cdd747f4f07bc8dae20c13 |
|
If nothing else it would be good to get the bug fix in for 5.0. |
|
wow. completely forgot about this one. will take a look today |
|
Re-ping. 5.0 branching is targeted for April 30th. If you want this in for 5.0, please target to get it in master by then. Thanks! |
|
Update - Since this is a bug fix, we don't need it by April 30th to make the 5.0 cut, but the sooner the better. Thanks. |
|
The IBM CI (GNU/Scale) build failed! Please review the log, linked below. Gist: https://gist.github.com/899d8d91e2ad598b59d3fcbfe3241f88 |
|
Will actually finish this up today. |
1aeb91b to
f0f3bab
Compare
|
Should be good to go. Will run some tests shortly to verify. |
|
Can one of the admins verify this patch? |
|
bot:aws:retest |
|
@hjelmn tells me that this is still relevant / just fell off the radar. It should probably go to releases as well (which I interpret to mean v4.0.x and v4.1.x). It passed a retest 3 days ago, so it's probably good to go. |
|
@bosilca @ggouaillardet ...any other one-sided people: can you review? I can do a functional test on this, but can't really review the code. |
|
@zhngaj can you review this one? |
|
Sure, please give me an hour or two if possible. |
f0f3bab to
ec71780
Compare
|
The IBM CI (PGI) build failed! Please review the log, linked below. Gist: https://gist.github.com/25baea9bad1c453c6e22190672ef709d |
This commit fixes a number of bugs in the handling of derived datatypes when using MPI__Accumulate, MP_Get_accumulate, and MPI_Fetch_and_op. The following bugs are fixed: - Incorrect results when using MPI_OP_NO_OP with a 0 origin count. osc/rdma was not ignoring the source address, count, and datatype in the MPI_OP_NO_OP case as is required by the standard. - Correctly handle result_buffer=MPI_BOTTOM in ompi_osc_rdma_gacc_local(). - Test result_datatype in order to figure out whether a get is performed. References open-mpi#6275 Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp> Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
|
The IBM CI (GNU/Scale) build failed! Please review the log, linked below. Gist: https://gist.github.com/8ba5f583c0c4816cbe59b61c7dfef83c |
|
The IBM CI (XL) build failed! Please review the log, linked below. Gist: https://gist.github.com/eb9eb190c9d2bdaa4a206158f22ee3b6 |
|
@zhngaj Ready for review again. Rebased and all good to go. |
|
Looks like a build issue from the rebase fixing. |
This commit rearranges the accumulate code so that network AMOs can be used in a larger number of potential situations. This commit adds a new MCA variable: osc_rdma_network_max_amo. This variable controls the maximum datatype count where AMOs will be used. The old default for this support was count == 1. The new default is count == 32. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
This commit fixes a resource leak when using network atomics. There was a leak when the underlying BTL did not support the attempted atomic. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
ec71780 to
9b2ed1e
Compare
|
The IBM CI (PGI) build failed! Please review the log, linked below. Gist: https://gist.github.com/a405129f92ef05223da08d53104f10ef |
|
The IBM CI (XL) build failed! Please review the log, linked below. Gist: https://gist.github.com/75905cbb59d1f238d71618f85a690e9e |
|
@jjhursey Odd failures. I see only a couple of warnings (i should fix) but no code errors. |
|
I'm looking at the CI failures now. They seem infrastructure related. |
|
bot:ibm:retest |
|
IBM Ci is looking good now. Sorry for the noise. |
|
@zhngaj Nathan says that this is rebased, still necessary, and ready for review. Can you review in the near future? Thanks! |
Sure, will review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This commit fixes a bug which can lead to incorrect results when using
MPI_OP_NO_OP. The component was not ignoring the source address,
count, and datatype as it should have.
This commit also cleans up some unused/unnecessary code.
References #6275
Signed-off-by: Nathan Hjelm hjelmn@lanl.gov