-
Notifications
You must be signed in to change notification settings - Fork 928
coll-base-allgather: fix MPI_IN_PLACE processing #5450
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
coll-base-allgather: fix MPI_IN_PLACE processing #5450
Conversation
The call of MPI_Allgather with sendbuf and sendtype parameters equal to MPI_IN_PLACE and NULL correspondingly, produces the segmentation fault. The problem is that sendtype is used even when sendbuf value is MPI_IN_PLACE. But according to the standard, sendtype and sendcount parameters should be ignored in this case. Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>
|
Reproducer |
|
bot:mellanox:retest |
1 similar comment
|
bot:mellanox:retest |
|
@mkurnosov please open a PR against 4.0.x with a cherry-pick of 540c2d1 once this PR is merged into master. |
|
@hppritcha done, I reopend the PR #5474 |
|
In conjunction with the fix in this PR, would the tuned allgather algorithms also need to be fixed? I ran into a segfault with the above reproducer code on master and the fault was in Also, |
|
@hppritcha , @bosilca : Any thoughts on the above comment related to the segfault in the tuned algorithm as well? If this is a valid issue, I can create a PR for review. |
|
A fix for tuned would be more than welcome. |
PR open-mpi#5450 addresses MPI_IN_PLACE processing for basic collective algorithms. But in conjunction with that, we need to check for MPI_IN_PLACE in tuned paths as well before calling ompi_datatype_type_size() as otherwise we segfault. MPI spec also stipulates to ignore sendcount and sendtype for Alltoall and Allgatherv operations. So, extending the check to these algorithms as well. Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@intel.com>
PR open-mpi#5450 addresses MPI_IN_PLACE processing for basic collective algorithms. But in conjunction with that, we need to check for MPI_IN_PLACE in tuned paths as well before calling ompi_datatype_type_size() as otherwise we segfault. MPI spec also stipulates to ignore sendcount and sendtype for Alltoall and Allgatherv operations. So, extending the check to these algorithms as well. Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@intel.com>
PR open-mpi#5450 addresses MPI_IN_PLACE processing for basic collective algorithms. But in conjunction with that, we need to check for MPI_IN_PLACE in tuned paths as well before calling ompi_datatype_type_size() as otherwise we segfault. MPI spec also stipulates to ignore sendcount and sendtype for Alltoall and Allgatherv operations. So, extending the check to these algorithms as well. Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@intel.com> (cherry picked from commit 88d7810)
PR open-mpi#5450 addresses MPI_IN_PLACE processing for basic collective algorithms. But in conjunction with that, we need to check for MPI_IN_PLACE in tuned paths as well before calling ompi_datatype_type_size() as otherwise we segfault. MPI spec also stipulates to ignore sendcount and sendtype for Alltoall and Allgatherv operations. So, extending the check to these algorithms as well. Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@intel.com>
The call of
MPI_Allgatherwith sendbuf and sendtype parameters equal toMPI_IN_PLACEandNULLcorrespondingly, produces the segmentation fault.The problem is that sendtype is used even when sendbuf value is
MPI_IN_PLACE. But according to the standard, sendtype and sendcount parameters should be ignored in this case.Signed-off-by: Mikhail Kurnosov mkurnosov@gmail.com