Skip to content

Conversation

@devreal
Copy link
Contributor

@devreal devreal commented Jul 11, 2022

Based on a reply by @yosefe in #9580 (comment) it appears that the way osc/ucx implements request based put and get operations might not work well on older networks. This is an attempt at using ucp_worker_flush_nb instead to acquire a request that completes once the put or get operations have completed without explicitly posting an atomic operation. I don't know the implementation details of ucp_worker_flush_nb but it seems like a cleaner solution to me.

Fallback to the old method of acquiring a request from an atomic operation is preserved in case ucp_worker_flush_nb is not available.

Some minor fixes to opal_common_ucx_winfo_flush are also included in this PR.

@janjust @yosefe is this the correct use of ucp_worker_flush_nb?

@jotabf fyi this might solve the performance issue you're seeing with MPI_Rget. I haven't been able to test it on a network like yours though.

Signed-off-by: Joseph Schuchart schuchart@icl.utk.edu
(cherry picked from commit eee891f)

Fallback to the old method of acquiring a request from an atomic operation
is preserved. Some networks might provide better performance
if the request-based operations do not rely on atomic operations.

Some minor fixes to opal_common_ucx_winfo_flush included in this
commit.

Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
(cherry picked from commit eee891f)
@janjust janjust changed the title osc/ucx: implement rput and rget using ucp_worker_flush_nb [v5.0.x] osc/ucx: implement rput and rget using ucp_worker_flush_nb [v5.0.x] Jul 11, 2022
@janjust janjust merged commit 5a4bcb2 into open-mpi:v5.0.x Jul 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants