-
Notifications
You must be signed in to change notification settings - Fork 931
Fix osc/rdma gcc warnings + possible race #8475
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
ab0b49f to
2bb2c16
Compare
2bb2c16 to
fc6a66d
Compare
|
See #8469. |
|
Please review #8469 and let me know if there are still issues. The warnings fixes look ok to me. |
|
@hjelmn thanks - didn't see your pr. Merged, I'll rebase this PR if there's anything left. |
fc6a66d to
ac93b4f
Compare
|
@hjelmn I rebased with your pr + another fix - I found that this test was failing with it looks like a timing gap where rank A hasn't quite finished exposing the memory while B is looking for it. I attempted to patch it, and the test now runs clean for me at 40+ ranks with --oversubscribe. Can you take a look? |
|
@awlauria Well. For dynamic memory windows they must use an external synchronization method. MPI_Barrier for example. Will look at the test. |
|
Hmm, though in that test the attach should be complete before the pointer is put in the list. Hmmm. I may need to root cause this. |
a07c2da to
9d7af02
Compare
|
@hjelmn thanks - I took another look. I think I found some spots where the exclusive lock could use some tightening around some shared variables, specifically when caching the region count. The test is running pretty clean for me now at 100 procs per run on one node - but I'll keep it running for a while to see how it goes. |
817b8bd to
f97cb12
Compare
|
@hjelmn I took out the 'retry code' and ran these commits over-night. All passed over 10k runs. |
Signed-off-by: Austen Lauria <awlauria@us.ibm.com>
There were some instances where the exclusive lock needed some tightening around the region structure. Signed-off-by: Austen Lauria <awlauria@us.ibm.com>
f97cb12 to
7856a25
Compare
|
Do we need this on v4.1 as well? |
|
If @hjelmn agrees with the change, then yeah I think it should go to the 4.x.x branches. |
No description provided.