Skip to content

Make sure a PSCW epoch is active on MPI_Win_complete calls #12793

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Sep 23, 2024
Merged

Make sure a PSCW epoch is active on MPI_Win_complete calls #12793

merged 1 commit into from
Sep 23, 2024

Conversation

codymann-cornelisnetworks
Copy link
Contributor

The test case in mpich test suite v4.2.1 is failing due to MPI_Win_Complete returning with no failures when it should be failing with MPI_ERR_RMA_SYNC. Simple fix included in this PR to make sure that a check is done for an active pscw epoch before further actions are taken.

config.log

Script to generate the failure:

#! /usr/bin/env bash
export MPI_PATH="/home/cmann/builds/ompi.debug"
export MPI_LIB_PATH="$MPI_PATH/lib"
export MPI_BIN_PATH="$MPI_PATH/bin"

export LIBFAB_PATH="/home/cmann/builds/libfabric.debug"
export LIBFAB_LIB_PATH="$LIBFAB_PATH/lib"
export LIBFAB_BIN_PATH="$LIBFAB_PATH/bin"

./configure --enable-strictmpi --with-mpi=$MPI_PATH --disable-dtpools

 if [[ $? -ne 0 ]]
 then
    echo "Configure failure"
    exit 1
fi


HOSTS="phwtstl005,phwtstl006"
export RUNTESTS_SHOWPROGRESS=1
# export MPITEST_PPNARG="-N %d"
export VERBOSE=1
export MPITEST_PPNMAX=36
export MPITEST_TIMEOUT=300
export MPITEST_PROGRAM_WRAPPER="--host $HOSTS --mca btl_ofi_disable_sep 1 --mca btl_ofi_disable_hmem 1  --mca osc ^ucx --mca pml ^ucx --mca btl ofi --mca mtl_ofi_enable_sep 0 --mca mtl ofi --map-by :OVERSUBSCRIBE -x FI_OPX_UUID=$RANDOM -x FI_PROVIDER=opx -x LD_PRELOAD=/home/cmann/code/libfabric-devel/debug/handlers/segfault_abort_handler.so -x LD_LIBRARY_PATH=$LIBFAB_LIB_PATH:$MPI_LIB_PATH:$LD_LIBRARY_PATH -x PATH=$MPI_BIN_PATH:$PATH"
make testing

Test output:

/home/cmann/builds/ompi.debug/bin/mpiexec -n 2   --host phwtstl005,phwtstl006 --mca btl_ofi_disable_sep 1 --mca btl_ofi_disable_hmem 1  --mca osc ^ucx --mca pml ^ucx --mca btl ofi --mca mtl_ofi_enable_sep 0 --mca mtl ofi --map-by :OVERSUBSCRIBE -x FI_OPX_UUID=7553 -x FI_PROVIDER=opx -x LD_PRELOAD=/home/cmann/code/libfabric-devel/debug/handlers/segfault_abort_handler.so -x LD_LIBRARY_PATH=/home/cmann/builds/libfabric.debug/lib:/home/cmann/builds/ompi.debug/lib: -x PATH=/home/cmann/builds/ompi.debug/bin:/home/cmann/.vscode-server/cli/servers/Stable-fee1edb8d6d72a0ddff41e5f71a671c23ed924b9/server/bin/remote-cli:/home/cmann/.local/bin:/usr/share/Modules/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin ./win_sync_complete 
0: Operation succeeded, when it should have failed
1: Operation succeeded, when it should have failed
 Found 2 errors

Copy link
Contributor

@devreal devreal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the patch!

@devreal
Copy link
Contributor

devreal commented Sep 15, 2024

@janjust Seems the Nvidia CI ran out of space.

@janjust
Copy link
Contributor

janjust commented Sep 16, 2024

@janjust Seems the Nvidia CI ran out of space.

Thanks - I notified our admin, he'll clean it up

… rdma framework for osc.

Signed-off-by: Cody Mann <cody.mann@cornelisnetworks.com>
@codymann-cornelisnetworks
Copy link
Contributor Author

Just refreshed to the latest main. I'm new to contributing to Open MPI so please feel free to let me know if I'm missing any steps here for the contribution process.

@hjelmn hjelmn merged commit 86961a2 into open-mpi:main Sep 23, 2024
13 of 14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants