Skip to content

Use a supplied communicator group and not MPI_COMM_WORLD #237

Open
@ss421

Description

@ss421

While testing malak with XIOS in server mode, I encounter an error towards the end of the application that appear to be related to the VarObs and CX Writer. The error is not very helpful so I tested without the writer filters and the application runs. I suspect this is to do with the communicator group because when running in server mode, the jedi application does not use the MPI_COMM_WORLD comm, instead it uses a split communicator where some PEs are used by the server.

I did a search for MPI_COMM_WORLD and found a few references, the following in particular:

/// \note This filter must only be used with ObsSpaces using the \c MPI_COMM_WORLD communicator,
/// otherwise a deadlock will occur while writing the VarObs file. This is due to a limitation of
/// the \c Ops_WriteVarobs function, which could be removed by replacing \c mpl_comm_world in the
/// call to \c Ops_Mpl_Gatherv by \c mpi_group (for consistency with all other MPI calls in \c
/// Ops_WriteVarobs).

see: https://github.com/MetOffice/opsinputs/blob/2720b0b5d3ec2a129e27475d0fc6911547b4de17/src/opsinputs/VarObsWriter.h#L46C1-L50C22

suggest that there would be some additional work that is required if we want XIOS server + varWriter. I don't imagine that this is too much work. Adding a few people that may know:

@wsmigaj @DavidSimonin @ctgh @mikecooke77 @adammaycock @DJDavies2

if others have an ideas then please comment below.

Thanks, Steve.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions