Skip to content

Conversation

@devreal
Copy link
Contributor

@devreal devreal commented Jun 29, 2022

Based on a reply by @yosefe in #9580 (comment) it appears that the way osc/ucx implements request based put and get operations might not work well on older networks. This is an attempt at using ucp_worker_flush_nb instead to acquire a request that completes once the put or get operations have completed without explicitly posting an atomic operation. I don't know the implementation details of ucp_worker_flush_nb but it seems like a cleaner solution to me.

Fallback to the old method of acquiring a request from an atomic operation is preserved in case ucp_worker_flush_nb is not available.

Some minor fixes to opal_common_ucx_winfo_flush are also included in this PR.

@janjust @yosefe is this the correct use of ucp_worker_flush_nb?

@jotabf fyi this might solve the performance issue you're seeing with MPI_Rget. I haven't been able to test it on a network like yours though.

Signed-off-by: Joseph Schuchart schuchart@icl.utk.edu

Fallback to the old method of acquiring a request from an atomic operation
is preserved. Some networks might provide better performance
if the request-based operations do not rely on atomic operations.

Some minor fixes to opal_common_ucx_winfo_flush included in this
commit.

Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
@janjust
Copy link
Contributor

janjust commented Jul 1, 2022

bot:retest

@janjust
Copy link
Contributor

janjust commented Jul 5, 2022

@devreal please open up a v5.0 of this PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants