OSC/UCX: preserve the accumulate ordering for overlapping buffers during acc-lock less epocs and setting a proper wpool context mutex type #11178

MamziB · 2022-12-07T21:07:09Z

OSC/UCX: preserve the accumulate ordering for overlapping buffers during acc-lock less epocs and setting a proper wpool context mutex type

Signed-off-by: Mamzi Bayatpour mbayatpour@nvidia.com
Co-authored-by: Tomislav Janjusic tomislavj@nvidia.com

devreal

Thanks for this work @MamziB! Just a few questions/hints in the comments.

ompi/mca/osc/ucx/osc_ucx_comm.c

acc-lock less epocs Signed-off-by: Mamzi Bayatpour <mbayatpour@nvidia.com> Co-authored-by: Tomislav Janjusic <tomislavj@nvidia.com>

MamziB · 2022-12-08T20:07:59Z

@devreal Thanks for the comments. Please take a look at the updated commit.

that this object has been already getting constructed using opal_recursive_mutex_t inside opal_common_ucx_wpctx_create, but inside opal_common_ucx_ctx_t, it was missing the proper type. Signed-off-by: Mamzi Bayatpour <mbayatpour@nvidia.com> Co-authored-by: Tomislav Janjusic <tomislavj@nvidia.com>

devreal · 2022-12-09T08:41:43Z

ompi/mca/osc/ucx/osc_ucx_comm.c

+
+            if (!op_added) {
+                /* no more space so flush */
+                ret = opal_common_ucx_ctx_flush(module->ctx, OPAL_COMMON_UCX_SCOPE_WORKER, 0);


Is it safe to hold the ctx mutex during the flush?

yes, the ctx mutex is recursive, so it should be ok. it does not hang.

devreal · 2022-12-09T08:41:56Z

Thanks, it good now. Two more thoughts though:

Should we store the rank with the base and tail to avoid false positives and allow us to flush to endpoint instead of the worker?
Instead of resetting the array after a flush, it might be more efficient to reset that particular entry in the completion callback. That probably wouldn't even need the mutex in the callback since we're only resetting two integer values to 0 (the window of opportunity for a race is small and even if there is one in the worst we don't recognize that entry as being empty). And there is a good chance that it would reduce the need for flushes since entries are continually reset once operations complete.

Those are optimizations though and we could do them as a follow-up. I'll approve what we have now 👍

janjust · 2022-12-09T14:17:22Z

@devreal I'll open an issue to track the suggestions as a follow up - thanks for the comments.

janjust · 2022-12-09T14:18:01Z

@MamziB v5.0.x backport please, when you can.

MamziB · 2022-12-09T19:24:55Z

@devreal Thanks for the suggestions. My comments are as follows:

Yes, we could do that. However, the current implementation of osc ucx flush mandates that all the nonblocking accumulates finish before the flush is done (we flush the worker for that). Therefore, I am not sure if we gain much for changing to flush from target ep to worker. But for sure we need to investigate this suggestion's performance benefits.
Actually, this is similar to how I was envisioning the design at the first glance. Adding the mem info to nonblocking req and clearing off the corresponding mem from the array when req is completed. However, since accumulate already has its own lock, this patch is only relevant when we do not have acc lock (meaning only if the application calls MPI_Win_lock/lockall with EXCLUSIVE). Therefore, for the time being, I tried to isolate the mem handling from all other parts of the code. The benefit of the current patch is that all other types of RMA SYNCs will not be impacted by this patch. It also simplifies the osc accumulate design.

#11184

github-actions bot added the Target: main label Dec 7, 2022

MamziB mentioned this pull request Dec 7, 2022

OSC UCX: Enable software atomicity when lock is set to MPI_MODE_NOCHECK #10493

Merged

janjust requested a review from devreal December 7, 2022 21:08

devreal requested changes Dec 8, 2022

View reviewed changes

ompi/mca/osc/ucx/osc_ucx_comm.c Outdated Show resolved Hide resolved

ompi/mca/osc/ucx/osc_ucx_comm.c Outdated Show resolved Hide resolved

ompi/mca/osc/ucx/osc_ucx_comm.c Show resolved Hide resolved

ompi/mca/osc/ucx/osc_ucx_comm.c Show resolved Hide resolved

OSC/UCX: preserve the accumulate ordering for overlapping buffers during

e3c3391

acc-lock less epocs Signed-off-by: Mamzi Bayatpour <mbayatpour@nvidia.com> Co-authored-by: Tomislav Janjusic <tomislavj@nvidia.com>

MamziB force-pushed the mamzi/outstanding-nb-acc branch from 5c3059c to e3c3391 Compare December 8, 2022 20:07

MamziB changed the title ~~OSC/UCX: preserve the accumulate ordering for overlapping buffers during acc-lock less epocs~~ OSC/UCX: preserve the accumulate ordering for overlapping buffers during acc-lock less epocs and setting a proper wpool context mutex type Dec 8, 2022

janjust requested a review from devreal December 8, 2022 21:48

devreal reviewed Dec 9, 2022

View reviewed changes

devreal approved these changes Dec 9, 2022

View reviewed changes

janjust merged commit 293cf4b into open-mpi:main Dec 9, 2022

janjust mentioned this pull request Dec 9, 2022

osc/ucx: optimizations #11184

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

OSC/UCX: preserve the accumulate ordering for overlapping buffers during acc-lock less epocs and setting a proper wpool context mutex type #11178

OSC/UCX: preserve the accumulate ordering for overlapping buffers during acc-lock less epocs and setting a proper wpool context mutex type #11178

Uh oh!

MamziB commented Dec 7, 2022 •

edited

Loading

Uh oh!

devreal left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

MamziB commented Dec 8, 2022

Uh oh!

devreal Dec 9, 2022

Uh oh!

MamziB Dec 9, 2022

Uh oh!

devreal commented Dec 9, 2022

Uh oh!

janjust commented Dec 9, 2022

Uh oh!

janjust commented Dec 9, 2022

Uh oh!

MamziB commented Dec 9, 2022 •

edited

Loading

Uh oh!

Uh oh!

OSC/UCX: preserve the accumulate ordering for overlapping buffers during acc-lock less epocs and setting a proper wpool context mutex type #11178

OSC/UCX: preserve the accumulate ordering for overlapping buffers during acc-lock less epocs and setting a proper wpool context mutex type #11178

Uh oh!

Conversation

MamziB commented Dec 7, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

devreal left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

MamziB commented Dec 8, 2022

Uh oh!

devreal Dec 9, 2022

Choose a reason for hiding this comment

Uh oh!

MamziB Dec 9, 2022

Choose a reason for hiding this comment

Uh oh!

devreal commented Dec 9, 2022

Uh oh!

janjust commented Dec 9, 2022

Uh oh!

janjust commented Dec 9, 2022

Uh oh!

MamziB commented Dec 9, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

MamziB commented Dec 7, 2022 •

edited

Loading

MamziB commented Dec 9, 2022 •

edited

Loading