Skip to content

Neighborhood Collectives #2324

Closed
Closed
@dalcinl

Description

@dalcinl

@jsquyres I've found serious issues in the implementation of neighborhood collectives. These issues showed up after a memory allocation optimization I've implemented in mpi4py in commit c2eaa292, which failed badly with Open MPI 2.0.1 in this Bitbucket Pipelines build.

After checking the source code in ompi/v2.x, I found a major issue for the case of intracommunicators with a topology. The code simply do not check the topology to determine the number or sources and destinations, and use ompi_comm_size() instead. This leads to wrong checks and reads after end of arrays.

In mpi4py, I'm using the following code to determine the number of sources and destinations. This piece of code only handles intracommunicators, but could be extended to intercommunicators.

BTW, the wording of the MPI standard seems to require a topology for neighborhood collectives, although supporting intracomms and intercomms seems possible and the result should be equivalent to regular allgather and alltoall. Or maybe the right thing to do is to just error in case of intercomms or intracomms with no topology.

Am I missing something?

PS: MPICH seems to implement neighbor collectives just for Cartesian/graph topologies, the helper routine MPIR_Topo_canon_nhb_count is quite similar to what I'm using in mpi4py (though the MPICH one fails in case of no topology). @roblatham00 Any comments to you competitors :-) about this?

$[dalcinl@localhost mpich.git]$ git grep MPIR_Topo_canon_nhb_count
src/include/mpir_topo.h:int MPIR_Topo_canon_nhb_count(MPIR_Comm *comm_ptr, int *indegree, int *outdegree, int *weighted);
src/mpi/topo/inhb_allgather.c:    mpi_errno = MPIR_Topo_canon_nhb_count(comm_ptr, &indegree, &outdegree, &weighted);
src/mpi/topo/inhb_allgatherv.c:    mpi_errno = MPIR_Topo_canon_nhb_count(comm_ptr, &indegree, &outdegree, &weighted);
src/mpi/topo/inhb_alltoall.c:    mpi_errno = MPIR_Topo_canon_nhb_count(comm_ptr, &indegree, &outdegree, &weighted);
src/mpi/topo/inhb_alltoallv.c:    mpi_errno = MPIR_Topo_canon_nhb_count(comm_ptr, &indegree, &outdegree, &weighted);
src/mpi/topo/inhb_alltoallw.c:    mpi_errno = MPIR_Topo_canon_nhb_count(comm_ptr, &indegree, &outdegree, &weighted);
src/mpi/topo/topoutil.c:#define FUNCNAME MPIR_Topo_canon_nhb_count
src/mpi/topo/topoutil.c:int MPIR_Topo_canon_nhb_count(MPIR_Comm *comm_ptr, int *indegree, int *outdegree, int *weighted)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions