Skip to content

nonblocking reductions in Fortran with non-contiguous buffers of different layouts #663

Open
@jeffhammond

Description

@jeffhammond

Problem

This is almost impossible to implement:

  type(MPI_Request) :: R
  integer, dimension(300) :: A
  integer, dimension(200) :: B
  MPI_Iallreduce(A(1:300:3), B(1:200:2), 100, MPI_INTEGER, MPI_SUM, MPI_COMM_WORLD, R)
  MPI_Wait(R, MPI_STATUS_IGNORE)

In MPICH and VAPAA, non-contiguous Fortran subarrays are supported by creating a datatype corresponding to the CFI_cdesc_t coming from Fortran (e.g. MPICH implementation).

In most MPI functions, there is one datatype for every buffer. However, for reductions, there is only one datatype, so there is no way to capture the layout information of both the input and output buffers, if they are different.

Furthermore, if we are creating a custom datatype, we have to use a custom reduction operator / function. MPI_User_function has only one datatype argument, so again, it is impossible to carry along the required layout information.

Obviously, in blocking functions, we can allocate temporary buffers and make contiguous copies where necessary, but in the non-blocking case, we can't free the buffer since we don't have completion callbacks.

Proposal

I prefer Option 3...

Option 1 - completion callbacks (add stuff to the standard)

I can solve the nonblocking problem with completion callbacks that allow me to cleanup temporaries. This is a very general solution that has lots of use cases, but the Forum seems to be opposed to it.

In the blocking case, we don't have to do anything.

Option 2 - implementations are very complicated (no changes to the standard)

Implementations that do something far more complicated that what VAPAA and MPICH do right now can solve this, but it is not pretty. They have to pass the CFI information down in to the implementation of reductions and handle different layouts, or they have allocate temporaries and clean them up using an internal mechanism. I suspect implementations have the capability to do the latter already and would go that route, if only because most MPI implementations do not want to deal with CFI_cdesc_t any more than absolutely necessary.

Option 3 - prohibit this usage (backwards-incompatible changes to the standard)

The easy solution is for us to add a backwards-incompatible restriction that reductions require Fortran buffers to have equivalent layouts. This is only technically backwards-incompatible, because nobody supports this today (at least in the nonblocking case - the blocking case might work due to implicit contiguous copy-in and copy-out, which Fortran compilers do when they see the CONTIGUOUS attribute).

I will argue that we implicitly require this anyways by virtue of having only one datatype argument, which means that users cannot pass buffers with different layouts from C. It is only because of the invisible layout differences associated with Fortran 2018 that users can do this.

Changes to the Text

Option 3 would add text to state that users are required to pass Fortran buffers of equivalent shape.

We need to be careful about how we say "equivalent shape" because one can have identical memory layouts corresponding to different Fortran shapes, and we only need to constrain the former.

Impact on Implementations

Option 3 requires no implementation changes.

Impact on Users

Users are no longer allowed to do crazy things that are at best unreliable today.

References and Pull Requests

Metadata

Metadata

Assignees

No one assigned

    Labels

    mpi-6For inclusion in the MPI 5.1 or 6.0 standardwg-fortranFortran Working Group

    Type

    No type

    Projects

    Status

    To Do

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions