-
Notifications
You must be signed in to change notification settings - Fork 12
fix(mpi): ensure wait_any
completes at least one request
#175
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
src/KokkosComm/mpi/req.hpp
Outdated
while (completed != 0) { | ||
for (Req<Mpi> &req : reqs) { | ||
int flag; | ||
MPI_Test(&(req.mpi_request()), &flag, MPI_STATUS_IGNORE); | ||
if (flag) { | ||
completed++; | ||
wait(req); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is semantically correct now but I am not really fond of the active wait. Can't we build a MPI_REQUEST array and use MPI_Waitany
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure! I think we should do the same for wait_all
in that case 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do we go about calling the registered callbacks on the requests that have completed, though?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MPI_Waitany
provides the index of the thing that completed, so I think you'd just have to go back and do the callbacks of our associated request.
Closes kokkos#173: change the implementation of `Req<Mpi>::wait_any` to actually complete at least one request (and calls any registered callbacks on that request) before returning.
546297b
to
88011aa
Compare
I changed the implementation to match the I think we can merge this as it is, and I'll do a follow-up PR that updates our |
for (Req<Mpi> &req : reqs) { | ||
int flag; | ||
MPI_Test(&(req.mpi_request()), &flag, MPI_STATUS_IGNORE); | ||
if (flag) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this work for MPI_REQUEST_NULL
?
If not, we can end-up with an infinite loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will take a look at what the standard says and fix this accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A call to MPI_TEST returns flag = true if the operation identified by request is complete. In such a case, the status object is set to contain information on the completed operation; if the communication object was created by a nonblocking send or receive, then it is deallocated and the request handle is set to MPI_REQUEST_NULL. The call returns flag = false, otherwise. In this case, the value of the status object is undefined. MPI_TEST is a local operation.
I am not an MPI expert but my understanding is that flag = false
if the input is MPI_REQUEST_NULL
as it cannot be completed.
Co-authored-by: Cédric Chevalier <cedric.chevalier019@proton.me>
This PR closes #173 by changing the implementation of
Req<Mpi>::wait_any
to actually complete at least one request (and calls any registered callbacks on that request) before returning.Also includes:
wait
to takeKokkosComm::Req<Mpi>
by reference to avoid a potentially costly deep copy of thepostWaits_
callback vectorwait
taking an rvalue-reference so that writing the following remains valid:KokkosComm::wait(KokkosComm::send(handle, view, peer));
wait_all
andwait_any
interfaces to usestd::span
instead ofstd::vector&
references