Skip to content

Conversation

@mkurnosov
Copy link
Contributor

The call of MPI_Allgather with sendbuf and sendtype parameters equal to MPI_IN_PLACE and NULL correspondingly, produces the segmentation fault.

The problem is that sendtype is used even when sendbuf value is MPI_IN_PLACE. But according to the standard, sendtype and sendcount parameters should be ignored in this case.

Signed-off-by: Mikhail Kurnosov mkurnosov@gmail.com

The call of MPI_Allgather with sendbuf and sendtype parameters equal to MPI_IN_PLACE and NULL correspondingly, produces the segmentation fault.

The problem is that sendtype is used even when sendbuf value is MPI_IN_PLACE. But according to the standard, sendtype and sendcount parameters should be ignored in this case.

Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>
@mkurnosov
Copy link
Contributor Author

Reproducer

#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>

int main(int argc, char **argv)
{
    int rank, comm_size;
    MPI_Init(&argc, &argv);
    MPI_Comm_size(MPI_COMM_WORLD, &comm_size);
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
    int count = 5;

    int *rbuf = malloc(count * comm_size * sizeof(*rbuf));
    if (!rbuf) {
        fprintf(stderr, "Error: malloc failed\n");
        MPI_Abort(MPI_COMM_WORLD, 1);
    }
    for (int i = 0; i < count; i++)
        rbuf[rank * count + i] = rank + i + 1;

    MPI_Allgather(MPI_IN_PLACE, 0, NULL, rbuf, count, MPI_INT, MPI_COMM_WORLD);

    for (int i = 0; i < comm_size; i++) {
        for (int j = 0; j < count; j++) {
            if (rbuf[i * count + j] != i + j + 1) {
                fprintf(stderr, "*** Validation error: [%d]=%d, need %d\n", 
                        i * count + j, rbuf[i * count + j], i + j + 1);
            }
        }
    }
        
    free(rbuf);
    MPI_Finalize();
    return 0;
}

@jsquyres
Copy link
Member

bot:mellanox:retest

1 similar comment
@jladd-mlnx
Copy link
Member

bot:mellanox:retest

@hppritcha
Copy link
Member

hppritcha commented Jul 23, 2018

@mkurnosov please open a PR against 4.0.x with a cherry-pick of 540c2d1 once this PR is merged into master.

@mkurnosov
Copy link
Contributor Author

@hppritcha done, I reopend the PR #5474

@bosilca bosilca merged commit 3f598e9 into open-mpi:master Sep 21, 2018
@aravindksg
Copy link
Contributor

In conjunction with the fix in this PR, would the tuned allgather algorithms also need to be fixed?

I ran into a segfault with the above reproducer code on master and the fault was in ompi_coll_tuned_allgather_intra_dec_fixed() where we have a call to ompi_datatype_type_size(sdtype, &dsize). With sdtype being NULL, it produces a segfault. It seems we need to check for MPI_IN_PLACE here too before calling ompi_datatype_type_size() ?

Also, MPI_IN_PLACE check would be needed in ompi_coll_tuned_allgatherv_intra_dec_fixed() as the spec states sendbuf and sendtype needs to be ignored for Allgatherv.

@aravindksg
Copy link
Contributor

@hppritcha , @bosilca : Any thoughts on the above comment related to the segfault in the tuned algorithm as well? If this is a valid issue, I can create a PR for review.

@bosilca
Copy link
Member

bosilca commented Oct 22, 2018

A fix for tuned would be more than welcome.

aravindksg added a commit to aravindksg/ompi that referenced this pull request Oct 24, 2018
PR open-mpi#5450 addresses MPI_IN_PLACE processing for basic collective algorithms.
But in conjunction with that, we need to check for MPI_IN_PLACE in tuned paths
as well before calling ompi_datatype_type_size() as otherwise we segfault.

MPI spec also stipulates to ignore sendcount and sendtype for Alltoall and
Allgatherv operations. So, extending the check to these algorithms as well.

Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@intel.com>
aravindksg added a commit to aravindksg/ompi that referenced this pull request Oct 24, 2018
PR open-mpi#5450 addresses MPI_IN_PLACE processing for basic collective algorithms.
But in conjunction with that, we need to check for MPI_IN_PLACE in tuned paths
as well before calling ompi_datatype_type_size() as otherwise we segfault.

MPI spec also stipulates to ignore sendcount and sendtype for Alltoall and
Allgatherv operations. So, extending the check to these algorithms as well.

Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@intel.com>
aravindksg added a commit to aravindksg/ompi that referenced this pull request Oct 31, 2018
PR open-mpi#5450 addresses MPI_IN_PLACE processing for basic collective algorithms.
But in conjunction with that, we need to check for MPI_IN_PLACE in tuned paths
as well before calling ompi_datatype_type_size() as otherwise we segfault.

MPI spec also stipulates to ignore sendcount and sendtype for Alltoall and
Allgatherv operations. So, extending the check to these algorithms as well.

Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@intel.com>
(cherry picked from commit 88d7810)
bosilca pushed a commit to bosilca/ompi that referenced this pull request Dec 3, 2018
PR open-mpi#5450 addresses MPI_IN_PLACE processing for basic collective algorithms.
But in conjunction with that, we need to check for MPI_IN_PLACE in tuned paths
as well before calling ompi_datatype_type_size() as otherwise we segfault.

MPI spec also stipulates to ignore sendcount and sendtype for Alltoall and
Allgatherv operations. So, extending the check to these algorithms as well.

Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants