Open
Description
Background information
-
v4.0.3
-
installed from source (tar)
-
cuda aware mpi
-
cuda 10.2
-
This is not a system problem, but suspected behavior/implementation issue in cuda-aware MPI. it will happen on all systems
Details of the problem
Inside cuda-aware MPI (here) you use async cuda streams to send messages. However, user's program run on other streams.
Therefore, your streams in cuda-aware implementation should be able to wait for work completion of user's streams,
otherwise, it would results in incorrect programs,
or it will force users to fully synchronize its streams before calling MPI.
See Pytorch discussion on the matter.
Possible solutions: expose the streams to the user, or (preferable) let the user allocate and manage them.
Metadata
Metadata
Assignees
Labels
No labels