fix(mpi): ensure `wait_any` completes at least one request #175

dssgabriel · 2025-10-02T22:06:44Z

This PR closes #173 by changing the implementation of Req<Mpi>::wait_any to actually complete at least one request (and calls any registered callbacks on that request) before returning.

Also includes:

Changed wait to take KokkosComm::Req<Mpi> by reference to avoid a potentially costly deep copy of the postWaits_ callback vector
Added an overload on wait taking an rvalue-reference so that writing the following remains valid:
```
KokkosComm::wait(KokkosComm::send(handle, view, peer));
```
Modernized wait_all and wait_any interfaces to use std::span instead of std::vector& references

cedricchevalier19 · 2025-10-03T09:30:08Z

src/KokkosComm/mpi/req.hpp

+  while (completed != 0) {
+    for (Req<Mpi> &req : reqs) {
+      int flag;
+      MPI_Test(&(req.mpi_request()), &flag, MPI_STATUS_IGNORE);
+      if (flag) {
+        completed++;
+        wait(req);
+      }
    }


It is semantically correct now but I am not really fond of the active wait. Can't we build a MPI_REQUEST array and use MPI_Waitany?

Sure! I think we should do the same for wait_all in that case 👍

How do we go about calling the registered callbacks on the requests that have completed, though?

MPI_Waitany provides the index of the thing that completed, so I think you'd just have to go back and do the callbacks of our associated request.

Closes kokkos#173: change the implementation of `Req<Mpi>::wait_any` to actually complete at least one request (and calls any registered callbacks on that request) before returning.

dssgabriel · 2025-10-03T21:07:00Z

I changed the implementation to match the MPI_Waitany semantics, i.e., complete one request and exit, as opposed to what I had before, which was more similar to MPI_Waitsome (which may complete multiple requests before returning).

I think we can merge this as it is, and I'll do a follow-up PR that updates our wait_all and wait_any implementation to extract arrays of MPI_Requests and directly call the MPI functions.

cedricchevalier19 · 2025-10-07T10:09:30Z

src/KokkosComm/mpi/req.hpp

+    for (Req<Mpi> &req : reqs) {
+      int flag;
+      MPI_Test(&(req.mpi_request()), &flag, MPI_STATUS_IGNORE);
+      if (flag) {


Does this work for MPI_REQUEST_NULL?
If not, we can end-up with an infinite loop.

I will take a look at what the standard says and fix this accordingly.

A call to MPI_TEST returns flag = true if the operation identified by request is complete. In such a case, the status object is set to contain information on the completed operation; if the communication object was created by a nonblocking send or receive, then it is deallocated and the request handle is set to MPI_REQUEST_NULL. The call returns flag = false, otherwise. In this case, the value of the status object is undefined. MPI_TEST is a local operation.

I am not an MPI expert but my understanding is that flag = false if the input is MPI_REQUEST_NULL as it cannot be completed.

src/KokkosComm/mpi/req.hpp

Co-authored-by: Cédric Chevalier <cedric.chevalier019@proton.me>

dssgabriel requested review from cwpearson and cedricchevalier19 October 2, 2025 22:06

dssgabriel added C-enhancement Category: an enhancement or bug fix A-mpi Area: KokkosComm MPI backend implementation labels Oct 2, 2025

cedricchevalier19 reviewed Oct 3, 2025

View reviewed changes

dssgabriel added 3 commits October 3, 2025 23:00

fix(mpi): ensure wait_any completes at least one request

833c7c0

Closes kokkos#173: change the implementation of `Req<Mpi>::wait_any` to actually complete at least one request (and calls any registered callbacks on that request) before returning.

refactor(mpi): modernize using std::span and pass Req as ref

97fb6d2

refactor(mpi): add wait_any with rvalue-reference param

88011aa

dssgabriel force-pushed the fix/mpi-wait-any branch from 546297b to 88011aa Compare October 3, 2025 21:02

dssgabriel requested a review from cedricchevalier19 October 6, 2025 16:19

cedricchevalier19 reviewed Oct 7, 2025

View reviewed changes

fix(mpi): add comment over active-wait loop

ed06197

Co-authored-by: Cédric Chevalier <cedric.chevalier019@proton.me>

cwpearson added the SNL-CI-APPROVAL Required to run SNL CI on non-SNL contributions label Oct 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(mpi): ensure `wait_any` completes at least one request #175

fix(mpi): ensure `wait_any` completes at least one request #175

Uh oh!

dssgabriel commented Oct 2, 2025 •

edited

Loading

Uh oh!

cedricchevalier19 Oct 3, 2025

Uh oh!

dssgabriel Oct 3, 2025

Uh oh!

dssgabriel Oct 3, 2025

Uh oh!

cwpearson Oct 3, 2025 •

edited

Loading

Uh oh!

dssgabriel commented Oct 3, 2025

Uh oh!

cedricchevalier19 Oct 7, 2025

Uh oh!

dssgabriel Oct 7, 2025

Uh oh!

cedricchevalier19 Oct 7, 2025

Uh oh!

Uh oh!

Uh oh!

fix(mpi): ensure wait_any completes at least one request #175

Are you sure you want to change the base?

fix(mpi): ensure wait_any completes at least one request #175

Uh oh!

Conversation

dssgabriel commented Oct 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cedricchevalier19 Oct 3, 2025

Choose a reason for hiding this comment

Uh oh!

dssgabriel Oct 3, 2025

Choose a reason for hiding this comment

Uh oh!

dssgabriel Oct 3, 2025

Choose a reason for hiding this comment

Uh oh!

cwpearson Oct 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dssgabriel commented Oct 3, 2025

Uh oh!

cedricchevalier19 Oct 7, 2025

Choose a reason for hiding this comment

Uh oh!

dssgabriel Oct 7, 2025

Choose a reason for hiding this comment

Uh oh!

cedricchevalier19 Oct 7, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

fix(mpi): ensure `wait_any` completes at least one request #175

fix(mpi): ensure `wait_any` completes at least one request #175

dssgabriel commented Oct 2, 2025 •

edited

Loading

cwpearson Oct 3, 2025 •

edited

Loading