OSC UCX: Enable software atomicity when lock is set to MPI_MODE_NOCHECK #10493

MamziB · 2022-06-21T17:58:14Z

OSC UCX: Enable software atomicity when lock is set to MPI_MODE_NOCHECK

Signed-off-by: Mamzi Bayatpour mbayatpour@nvidia.com
Co-authored-by: Tomislav Janjusic tomislavj@nvidia.com

devreal · 2022-06-21T18:19:53Z

@MamziB The title (and commit message) seem confusing and make it sound like setting MPI_MODE_NOCHECK leads to atomics being emulated in software. AFAICS, your patch adds support for MPI_MODE_NOCHECK and checks for whether taking the atomics lock is actually necessary if atomic operations are done in software. Could please clarify your commit message?

MamziB · 2022-06-21T20:40:48Z

In osc/ucx locking windows with MPI_MODE_NOCHECK would disable sw based atomicity in atomic calls such as MPI_Accumulate/MPI_Get_Accumulate/etc. In other words, these calls are not atomic anymore when MPI_MODE_NOCHECK is set (if HW support is not available).
By adding this patch, even if the window lock is acquired with MPI_MODE_NOCHECK we guarantee SW atomicity for these kinds of atomic calls.
 
FWIW, we saw this issue in a simple mpich test which does:
Win_Win_lock(MPI_NOCHECK);
MPI_Accumulate();
MPI_Win_unlock();

devreal · 2022-06-22T13:40:26Z

OK, I get now what the intention of this patch is. The commit message makes it sound like software based atomicity is always enabled if MPI_MODE_NOCHECK is set, even if HW support is usable.

Looking at the standard, though, I wonder whether that is necessary:

MPI_MODE_NOCHECK — no other process holds, or will attempt to acquire, a conflicting lock, while the caller holds the window lock. This is useful when mutual exclusion is achieved by other means, but the coherence operations that may be attached to the lock and unlock calls are still required.

Clearly, there should not be the need for the atomic lock if MPI_MODE_NOCHECK is set because there cannot not be a conflicting lock on another peer (and thus, no conflicting atomic operations). Can you point me to the test in MPICH?

janjust · 2022-06-22T15:08:21Z

@devreal acc_loc.c

Even though we don't lock the window, we still need to ensure atomicity of the operation.
Ensuring atomicity doesn't lock the window, though it does serialize operations.

devreal · 2022-06-22T15:29:46Z

Thanks @janjust! The example uses shared locks, which should be covered by the existing test in need_acc_lock. The test as it is today is:

!(NULL != lock && LOCK_EXCLUSIVE == lock->type);

So unless we have an exclusive lock we need to take the atomic lock. For shared locks it is true that we need to take the atomic lock, independent of whether MPI_MODE_NOCHECK is set.

This patch adds another check so that if we have an exclusive lock but MPI_MODE_NOCHECK is set we still need to take the atomic lock. The standard says that if you have an exclusive lock and use MPI_MODE_NOCHECK then there cannot be conflicting atomic operations because no other process is allowed to take a conflicting lock, so no need for to take the atomic lock.

Sorry for being picky here but a) I'd like to understand what this patch really does (it took 2 round trips to get the bottom of the problem) and b) I wonder if the problem really is in this line or somewhere else. Looking at the code for taking the shared/exclusive lock, I think the problem really is there:

ompi/ompi/mca/osc/ucx/osc_ucx_passive_target.c

Line 126 in e8e0c7f

lock->is_nocheck = true;

. Note that the type of the lock is not set there. If it was, we wouldn't need the patch proposed here (for the reasons outlined above). This patch would void the benefit of MPI_MODE_NOCHECK with exclusive locks, unnecessarily.

janjust · 2022-06-22T15:42:01Z

@devreal Thanks for the thorough explanation. I think I agree with you, ommiting the check, but setting the appropriate lock alleviates this issue in the first place.

MamziB · 2022-06-22T18:36:38Z

@devreal my only concern was the combination of MPI_MODE_NOCHECK + Exclusive lock. With our initial commit, we still guarantee the atomicity of the operations. Otherwise, we could just set the SHARED type in ompi/ompi/mca/osc/ucx/osc_ucx_passive_target.c as you proposed.

In this test acc_loc.c line 98, if I replace the SHARED with Exclusive, we still fail with your proposed solution.
However, I did not find anywhere in the standard saying explicitly that usage of MPI_MODE_NOCHECK + Exclusive lock will discard the atomicity of an operation such as MPI_Accumulate. Please kindly let me know where to read this.

For now, I will change the commit to have support for atomicity only for SHARED + MPI_MODE_NOCHECK. Later we can change it if we see an issue with an application.

Thanks a lot, @devreal for all your constructive comments! I appreciate it!

Thanks,
Mamzi

devreal · 2022-06-22T18:40:00Z

@MamziB As I said in #10493 (comment) there is no need for atomicity for the combination Exclusive Lock + MPI_MODE_NOCHECK. If you replace the shared lock with an exclusive lock in the test case the code is an incorrect MPI application.

devreal · 2022-06-22T20:33:34Z

@MamziB There is an additional commit now ("osc/ucx: fix osc ucx component priority selection") that seems unrelated. Was that intentional?

devreal · 2022-06-22T20:35:05Z

ompi/mca/osc/ucx/osc_ucx_passive_target.c

@@ -124,6 +124,11 @@ int ompi_osc_ucx_lock(int lock_type, int target, int mpi_assert, struct ompi_win
        }
    } else {
        lock->is_nocheck = true;
+        if (lock_type == MPI_LOCK_EXCLUSIVE) {


This looks good but the same problem exists in ompi_osc_ucx_lock_all

@devreal Good catch. Will add it there too. Thanks.

@devreal I took a look at ompi_osc_ucx_lock_all implementation. Looks like we are not creating any new lock object. Therefore, lock will be NULL inside need_acc_lock(). So we should be good here as if the lock is NULL, need_acc_lock returns true. Please let me know your thoughts.

Ahh right, I quickly glanced over it and saw this line which looked similar:

ompi/ompi/mca/osc/ucx/osc_ucx_passive_target.c

Line 216 in f7f928d

module->lock_all_is_nocheck = true;

but that's the module, not the lock.

MamziB · 2022-06-22T20:35:13Z

@devreal no, I am not sure how that went into. I created a new commit, then squashed it into the previous commit of PR. Im trying to figure out how to fix it.

EDIT: The commit is now resolved. Waiting for @devreal approval.

Signed-off-by: Mamzi Bayatpour <mbayatpour@nvidia.com> Co-authored-by: Tomislav Janjusic <tomislavj@nvidia.com>

devreal

Thanks @MamziB, looks good to me now 👍

devreal · 2022-06-23T00:09:51Z

ompi/mca/osc/ucx/osc_ucx_passive_target.c

@@ -124,6 +124,11 @@ int ompi_osc_ucx_lock(int lock_type, int target, int mpi_assert, struct ompi_win
        }
    } else {
        lock->is_nocheck = true;
+        if (lock_type == MPI_LOCK_EXCLUSIVE) {


Ahh right, I quickly glanced over it and saw this line which looked similar:

ompi/ompi/mca/osc/ucx/osc_ucx_passive_target.c

Line 216 in f7f928d

module->lock_all_is_nocheck = true;

but that's the module, not the lock.

janjust · 2022-06-23T00:13:25Z

@devreal thanks for all your input.

janjust · 2022-06-23T13:36:05Z

@MamziB , please open up v5.0 PR

devreal · 2022-06-23T13:38:01Z

There is #10494 but that has the original change in it. Do we need to port this back to 4.1?

janjust · 2022-06-23T13:40:04Z

@devreal , ah right, yes, just need to update it. Yes we should open up v4.1 equivalent, just fyi - at this stage with all the recent PRs v5.0.x is in a much better shape than v4.1.x We'll get to v4.1 once we're done with v5.0.x, some changes are v5.0/main specific due to wpool design

MamziB · 2022-12-02T19:12:21Z

@devreal @janjust

In the latest round of mpich testings, I found a data validation issue when our nonblocking accumulate is enabled. The test that is failing is lockcontention2.c line 255.

The reason for the data validation issue is clear: the test is using exclusive win lock, and therefore, inside the code, we avoid acquiring the acc lock (look at ompi_osc_need_acc_lock function). Since we are enabling the nonblocking design of MPI_Win_Accumulate, this leads to a data validation issue.

My question is:

Do you think this test's behavior is correct? Seems like the test assumes that MPI_Win_Accumulate is a blocking operation. If so, I think we might have to get the acc lock, even if the window lock is exclusive.

Mamzi

devreal · 2022-12-02T20:36:59Z

I'm not sure why we have a problem here. The call to MPI_WIN_LOCK should include a flush (must complete outstanding operations). Is that not the case with the nonblocking accumulates?

MamziB · 2022-12-02T20:39:59Z

@devreal yes, we do have a flush. However, here, the issue is that the target rank and the buffer are the same in both back-to-back calls to accumulate. Therefore, validation fails as they read the same target window value, instead of the updated version.

devreal · 2022-12-02T21:05:19Z

Oh, now I see what you mean... Yes, that code is correct. The two updates may happen in any order but it is legal to update the same target memory location multiple times with an accumulate. That is indeed a problem for the nonblocking updates. Tracking and deconflicting operations would be one way (if operations overlap we need to wait for the previous ones to complete). Or simply waiting for any previous operation in the same access epoch to complete (saves tracking). It would be nice to be able to have multiple accumulate operations active at the same time though.

However, don't we have a similar problem in the blocking implementation, albeit with a smaller window of opportunity? There we are waiting for the GET to complete, but not the PUT. So, the GET in the next accumulate call may read stale data (if the network does not order these operations).

MamziB · 2022-12-02T22:10:11Z

@devreal

"don't we have a similar problem in the blocking implementation"
No, in one of the previous commits, I added a flush at the end of the accumulate to make sure we avoid such scenarios.

I'm thinking of the adding following code to the beginning of the nonblocking accumulate to complete the previous ones. what do you think?

if (ompi_osc_need_acc_lock(module, target) == false) {
        ret = opal_common_ucx_ctx_flush(module->ctx, OPAL_COMMON_UCX_SCOPE_EP, target);
        if (ret != OMPI_SUCCESS) {
            return ret;
         }
}

devreal · 2022-12-04T10:36:17Z

Yes, that's one way to solve it. I would prefer a way that preserves more of the benefits of nonblocking accumulates. Your patch would diminish its usefulness in any case where there is more than one accumulate operation. Instead, tracking ranges of operations for each target (using the mca_rcache_base_vma infrastructure) and only flushing if a conflict occurs would provide better performance in many applications (where accumulates are spread over several variables). It's a bit more effort but I think it would be worth it.

devreal · 2022-12-04T16:41:13Z

Also, I have to correct an earlier statement I made:

Yes, that code is correct. The two updates may happen in any order but it is legal to update the same target memory location multiple times with an accumulate.

That in any order part was not correct. By default, updates to the same variable have to happen in program order and the "accumulate_ordering" info key can be used to lift that constraint.

MamziB · 2022-12-07T21:07:36Z

@devreal Thanks for the comments. #11178 fixes this issue.

jsquyres mentioned this pull request Jun 22, 2022

V5.0.x: OSC UCX: Enable software atomicity when lock is set to MPI_MODE_NOCHECK #10494

Merged

jsquyres added the Target: main label Jun 22, 2022

MamziB force-pushed the mamzi/atomic-acc-nocheck branch from e8e0c7f to d620afc Compare June 22, 2022 20:18

devreal reviewed Jun 22, 2022

View reviewed changes

OSC UCX: Setting the lock type when MPI_MODE_NOCHECK is set

f7f928d

Signed-off-by: Mamzi Bayatpour <mbayatpour@nvidia.com> Co-authored-by: Tomislav Janjusic <tomislavj@nvidia.com>

MamziB force-pushed the mamzi/atomic-acc-nocheck branch from d620afc to f7f928d Compare June 22, 2022 20:59

devreal approved these changes Jun 23, 2022

View reviewed changes

devreal merged commit eba4e33 into open-mpi:main Jun 23, 2022

OSC UCX: Enable software atomicity when lock is set to MPI_MODE_NOCHECK #10493

OSC UCX: Enable software atomicity when lock is set to MPI_MODE_NOCHECK #10493

Uh oh!

Conversation

MamziB commented Jun 21, 2022

Uh oh!

devreal commented Jun 21, 2022

Uh oh!

MamziB commented Jun 21, 2022

Uh oh!

devreal commented Jun 22, 2022

Uh oh!

janjust commented Jun 22, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

devreal commented Jun 22, 2022

Uh oh!

janjust commented Jun 22, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MamziB commented Jun 22, 2022

Uh oh!

devreal commented Jun 22, 2022

Uh oh!

devreal commented Jun 22, 2022

Uh oh!

devreal Jun 22, 2022

Choose a reason for hiding this comment

Uh oh!

MamziB Jun 22, 2022

Choose a reason for hiding this comment

Uh oh!

MamziB Jun 22, 2022

Choose a reason for hiding this comment

Uh oh!

devreal Jun 23, 2022

Choose a reason for hiding this comment

Uh oh!

MamziB commented Jun 22, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

devreal left a comment

Choose a reason for hiding this comment

Uh oh!

devreal Jun 23, 2022

Choose a reason for hiding this comment

Uh oh!

janjust commented Jun 23, 2022

Uh oh!

janjust commented Jun 23, 2022

Uh oh!

devreal commented Jun 23, 2022

Uh oh!

janjust commented Jun 23, 2022

Uh oh!

MamziB commented Dec 2, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

devreal commented Dec 2, 2022

Uh oh!

MamziB commented Dec 2, 2022

Uh oh!

devreal commented Dec 2, 2022

Uh oh!

MamziB commented Dec 2, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

devreal commented Dec 4, 2022

Uh oh!

devreal commented Dec 4, 2022

Uh oh!

MamziB commented Dec 7, 2022

Uh oh!

Uh oh!

janjust commented Jun 22, 2022 •

edited

Loading

janjust commented Jun 22, 2022 •

edited

Loading

MamziB commented Jun 22, 2022 •

edited

Loading

MamziB commented Dec 2, 2022 •

edited

Loading

MamziB commented Dec 2, 2022 •

edited

Loading