Skip to content

MPI Derived Datatypes, Scatter with ob1 and uct produces deadlocks #12698

Closed
@AnnaLena77

Description

@AnnaLena77

Background information

For certain research reasons, I need to use Open MPI with the pml ob1 in conjunction with Infiniband (ucx/uct as btl) on our cluster. This works largely without any problems.
In my particular program I am trying to split a matrix into sub-matrices (using Derived Datatypes) and distribute them to all processes using scatter.

What version of Open MPI are you using? (e.g., v4.1.6, v5.0.1, git branch name and hash, etc.)

Version 5.0.3

Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)

git clone

If you are building/installing from a git clone, please copy-n-paste the output from git submodule status.

+6f81bfd163f3275d2b0630974968c82759dd4439 3rd-party/openpmix (v1.1.3-3983-g6f81bfd1)
+4f27008906d96845e22df6502d6a9a29d98dec83 3rd-party/prrte (psrvr-v2.0.0rc1-4746-g4f27008906)
dfff67569fb72dbf8d73a1dcf74d091dad93f71b config/oac (heads/main)

Please describe the system on which you are running

  • Operating system/version: NixOs
  • Computer hardware: Cluster with AMD Opteron Processors 6272
  • Network type: Infiniband (mlx4_0) / Ethernet

Details of the problem

The scatter always ends in a deadlock if the matrix is selected to be correspondingly large (here in the example 900x900, size 90 still works) and ob1 is used in conjunction with uct.
(I am using slurm scripts to distribute all jobs)

mpirun -n 81 --mca pml ucx ./test                  --> works
mpirun -n 81 --mca pml ob1 ./test                  --> deadlock
mpirun -n 81 --mca pml ob1 --mca btl ^uct ./test   --> works

If I use normal MPI data types (e.g. MPI_DOUBLE) instead of the Derived Datatypes, everything also works with uct. So the problem is definitely with the derived datatypes that are to be used.

MPI Program:

#include <mpi.h>
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
int main(int argc, char *argv[]) {
    MPI_Init(&argc, &argv);

    int rank, size;
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
    MPI_Comm_size(MPI_COMM_WORLD, &size);

    int N = 900;  //Matrix-Dimension
    int block_size = N / (int)sqrt(size); 

    int *matrix = NULL;

    if (rank == 0) {
        matrix = (int *)malloc(N * N * sizeof(int));
        for (int i = 0; i < N; i++) {
            for (int j = 0; j < N; j++) {
                matrix[i * N + j] = i * N + j;
            }
        }
    }
    MPI_Datatype blocktype, blocktype_resized;
    int array_of_sizes[2] = {N, N};
    int array_of_subsizes[2] = {block_size, block_size};
    int array_of_starts[2] = {0, 0};
    MPI_Type_create_subarray(2, array_of_sizes, array_of_subsizes, array_of_starts, MPI_ORDER_C, MPI_INT, &blocktype);
    MPI_Type_create_resized(blocktype, 0, block_size * sizeof(int), &blocktype_resized);
    MPI_Type_commit(&blocktype_resized);

    int *recvbuf = (int *)malloc(block_size * block_size * sizeof(int));
    
    MPI_Scatter(matrix, 1, blocktype_resized, recvbuf, block_size * block_size, MPI_INT, 0, MPI_COMM_WORLD);

    free(recvbuf);
    if (rank == 0) {
        free(matrix);
    }

    MPI_Type_free(&blocktype_resized);
    MPI_Finalize();
    return 0; 
}

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions