synchronize cuda-aware mpi streams


## Background information

* v4.0.3
* installed from source (tar)
* cuda aware mpi
* cuda 10.2

* This is not a system problem, but suspected behavior/implementation issue in cuda-aware MPI. it will happen on all systems
-----------------------------

## Details of the problem

Inside cuda-aware MPI ([here](https://github.com/open-mpi/ompi/blob/e9e4d2a4bc4ca34d8f426a5f175e3a9eabe50a66/opal/mca/common/cuda/common_cuda.c)) you use async cuda streams to send messages. However, user's program run on other streams.
Therefore, your streams in cuda-aware implementation should be able to wait for work completion of user's streams,
otherwise, it would results in incorrect programs, 
or it will force users to fully synchronize its streams before calling MPI.
See [Pytorch discussion](https://discuss.pytorch.org/t/mpi-cuda-stream/80702/2) on the matter.

Possible solutions: expose the streams to the user, or (preferable) let the user allocate and manage them.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

synchronize cuda-aware mpi streams #7733

Background information

Details of the problem

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

synchronize cuda-aware mpi streams #7733

Description

Background information

Details of the problem

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions