Skip to content

MPI_spawn and MPI_Intercomm_merge causes UCX errors #8426

@tomhaber

Description

@tomhaber

Background information

mpirun --version
# mpirun (Open MPI) 4.0.3

ucx_info -v
# UCT version=1.8.0 revision 0000000
# configured with: --prefix=/shared/common/software/UCX/1.8.0-GCCcore-9.3.0 --build=x86_64-pc-linux-gnu --host=x86_64-pc-linux-gnu --enable-optimizations --enable-cma --enable-mt --with-verbs --without-java --disable-doxygen-doc

Details of the problem

Test code:

#include <stdio.h>
#include <mpi.h>

int main(int argc, const char * argv[]) {
    int rank, size;

    int error_codes;

    MPI_Init(&argc, (char ***)&argv);

    MPI_Comm parentcomm;
    MPI_Comm intercomm;
    MPI_Comm comm;
    MPI_Comm_get_parent(&parentcomm);

    if (parentcomm == MPI_COMM_NULL) {
        printf("Current Command %s\n", argv[0]);

                MPI_Comm_spawn(argv[0], MPI_ARGV_NULL, 2, MPI_INFO_NULL, 0, MPI_COMM_WORLD, &intercomm, &error_codes);
                MPI_Intercomm_merge(intercomm, 0, &comm);
    } else {
        MPI_Intercomm_merge(parentcomm, 1, &comm);
    }

    MPI_Comm_rank(comm, &rank);
    MPI_Comm_size(comm, &size);

    printf("Rank %d of %d\n", rank, size);

    MPI_Barrier(comm);
    MPI_Finalize();
    return 0;
}

While the code actually does run correctly, it produces a bunch of errors in UCX

[wt-1-13:846422:0:846422]      ucp_ep.c:725  Bug: pending request 0x1e29228 on ep 0x7fed42169090 should have been flushed
[wt-1-13:846423:0:846423]      ucp_ep.c:725  Bug: pending request 0x830a28 on ep 0x7f95c3730090 should have been flushed
[1611843290.366895] [wt-1-13:846414:0]          mpool.c:42   UCX  WARN  object 0x6dd740 was not returned to mpool ucp_requests
[1611843290.366915] [wt-1-13:846414:0]          mpool.c:42   UCX  WARN  object 0x6dd900 was not returned to mpool ucp_requests
[1611843290.366919] [wt-1-13:846414:0]          mpool.c:42   UCX  WARN  object 0x6ddac0 was not returned to mpool ucp_requests
[1611843290.366923] [wt-1-13:846414:0]          mpool.c:42   UCX  WARN  object 0x6ddc80 was not returned to mpool ucp_requests
[1611843290.366926] [wt-1-13:846414:0]          mpool.c:42   UCX  WARN  object 0x6dde40 was not returned to mpool ucp_requests
[1611843290.366929] [wt-1-13:846414:0]          mpool.c:42   UCX  WARN  object 0x6de000 was not returned to mpool ucp_requests
[1611843290.366933] [wt-1-13:846414:0]          mpool.c:42   UCX  WARN  object 0x6de1c0 was not returned to mpool ucp_requests
[1611843290.366936] [wt-1-13:846414:0]          mpool.c:42   UCX  WARN  object 0x6de380 was not returned to mpool ucp_requests
[1611843290.366940] [wt-1-13:846414:0]          mpool.c:42   UCX  WARN  object 0x6de540 was not returned to mpool ucp_requests

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions