Skip to content

Commit b05ad74

Browse files
markallejjhursey
authored andcommitted
osc/pt2pt: Fix hang with Put and Win_lock_all
* When using `MPI_Put` with `MPI_Win_lock_all` a hang is possible since the `put` is waiting on `eager_send_active` to become `true` but that variable might not be reset in the case of `MPI_Win_lock_all` depending on other incoming events (e.g., `post` or ACKs of lock requests. Signed-off-by: Joshua Hursey <jhursey@us.ibm.com> (cherry picked from commit eec1d5b)
1 parent 3c142b7 commit b05ad74

File tree

1 file changed

+21
-2
lines changed

1 file changed

+21
-2
lines changed

ompi/mca/osc/pt2pt/osc_pt2pt_comm.c

Lines changed: 21 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@
1515
* Copyright (c) 2015 Research Organization for Information Science
1616
* and Technology (RIST). All rights reserved.
1717
* Copyright (c) 2016 FUJITSU LIMITED. All rights reserved.
18+
* Copyright (c) 2016 IBM Corporation. All rights reserved.
1819
* $COPYRIGHT$
1920
*
2021
* Additional copyrights may follow
@@ -336,7 +337,16 @@ static inline int ompi_osc_pt2pt_put_w_req (const void *origin_addr, int origin_
336337

337338
if (is_long_msg) {
338339
/* wait for eager sends to be active before starting a long put */
339-
ompi_osc_pt2pt_sync_wait_expected (pt2pt_sync);
340+
if (pt2pt_sync->type == OMPI_OSC_PT2PT_SYNC_TYPE_LOCK) {
341+
OPAL_THREAD_LOCK(&pt2pt_sync->lock);
342+
ompi_osc_pt2pt_peer_t *peer = ompi_osc_pt2pt_peer_lookup (module, target);
343+
while (!(peer->flags & OMPI_OSC_PT2PT_PEER_FLAG_EAGER)) {
344+
opal_condition_wait(&pt2pt_sync->cond, &pt2pt_sync->lock);
345+
}
346+
OPAL_THREAD_UNLOCK(&pt2pt_sync->lock);
347+
} else {
348+
ompi_osc_pt2pt_sync_wait_expected (pt2pt_sync);
349+
}
340350
}
341351

342352
OPAL_OUTPUT_VERBOSE((50, ompi_osc_base_framework.framework_output,
@@ -495,7 +505,16 @@ ompi_osc_pt2pt_accumulate_w_req (const void *origin_addr, int origin_count,
495505

496506
if (is_long_msg) {
497507
/* wait for synchronization before posting a long message */
498-
ompi_osc_pt2pt_sync_wait_expected (pt2pt_sync);
508+
if (pt2pt_sync->type == OMPI_OSC_PT2PT_SYNC_TYPE_LOCK) {
509+
OPAL_THREAD_LOCK(&pt2pt_sync->lock);
510+
ompi_osc_pt2pt_peer_t *peer = ompi_osc_pt2pt_peer_lookup (module, target);
511+
while (!(peer->flags & OMPI_OSC_PT2PT_PEER_FLAG_EAGER)) {
512+
opal_condition_wait(&pt2pt_sync->cond, &pt2pt_sync->lock);
513+
}
514+
OPAL_THREAD_UNLOCK(&pt2pt_sync->lock);
515+
} else {
516+
ompi_osc_pt2pt_sync_wait_expected (pt2pt_sync);
517+
}
499518
}
500519

501520
header = (ompi_osc_pt2pt_header_acc_t*) ptr;

0 commit comments

Comments
 (0)