Closed
Description
UCX and MPI-Sessions
When I try to use OpenMPI with USX on our small University-Cluster I got an error message
saying that MPI-Session Features are not supported by UCX (The Cluster uses an Infiniband connection).
However, when I install it on my Local-Machine (ArchLinux)
all seems to work fine. So I'm wondering whether the MPI-Sessions are supported by UCX or not?
Source Code (main.c):
#include <mpi.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
void function_my_session_errhandler(MPI_Session *foo, int *bar, ...) {
fprintf(stderr, "my error handler called here with error %d\n", *bar);
}
void function_check_print_error(char *format, int rc) {
if (MPI_SUCCESS != rc) {
fprintf(stderr, format, rc);
abort();
}
}
int main(int argc, char *argv[]) {
MPI_Session session;
MPI_Errhandler errhandler;
MPI_Group group;
MPI_Comm comm_world, comm_self;
MPI_Info info;
int rc, npsets, one = 1, sum;
rc = MPI_Session_create_errhandler(function_my_session_errhandler, &errhandler);
function_check_print_error("Error handler creation failed with rc = %d\n", rc);
rc = MPI_Info_create(&info);
function_check_print_error("Info creation failed with rc = %d\n", rc);
rc = MPI_Info_set(info, "thread_level", "MPI_THREAD_MULTIPLE");
function_check_print_error("Info key/val set failed with rc = %d\n", rc);
rc = MPI_Session_init(info, errhandler, &session);
function_check_print_error("Session initialization failed with rc = %d\n", rc);
rc = MPI_Session_get_num_psets(session, MPI_INFO_NULL, &npsets);
function_check_print_error(" with rc = %d\n", rc);
for (int i = 0; i < npsets; i++) {
int psetlen = 0;
char pset_name[256];
MPI_Session_get_nth_pset(session, MPI_INFO_NULL, i, &psetlen, NULL);
MPI_Session_get_nth_pset(session, MPI_INFO_NULL, i, &psetlen, pset_name);
fprintf(stderr, " PSET %d: %s (len: %d)\n", i, pset_name, psetlen);
}
rc = MPI_Group_from_session_pset(session, "mpi://WORLD", &group);
function_check_print_error("Could not get a group for mpi://WORLD. rc = %d\n", rc);
rc = MPI_Comm_create_from_group(group, "my_world", MPI_INFO_NULL, MPI_ERRORS_RETURN, &comm_world);
function_check_print_error("Could not create Communicator my_world. rc = %d\n", rc);
MPI_Group_free(&group);
MPI_Allreduce(&one, &sum, 1, MPI_INT, MPI_SUM, comm_world);
fprintf(stderr, "World Comm Sum (1): %d\n", sum);
rc = MPI_Group_from_session_pset(session, "mpi://SELF", &group);
function_check_print_error("Could not get a group for mpi://SELF. rc = %d\n", rc);
MPI_Comm_create_from_group(group, "myself", MPI_INFO_NULL, MPI_ERRORS_RETURN, &comm_self);
MPI_Group_free(&group);
MPI_Allreduce(&one, &sum, 1, MPI_INT, MPI_SUM, comm_self);
fprintf(stderr, "Self Comm Sum (1): %d\n", sum);
MPI_Errhandler_free(&errhandler);
MPI_Info_free(&info);
MPI_Comm_free(&comm_world);
MPI_Comm_free(&comm_self);
MPI_Session_finalize(&session);
return 0;
}
Commands used to compile and run
mpicc \-o main main.c
mpirun -np 1 -mca osc ucx out/main
Console Output Uni-Cluster
$ mpirun -np 1 -mca pml ucx main
PSET 0: mpi://WORLD (len: 12)
PSET 1: mpi://SELF (len: 11)
PSET 2: mpix://SHARED (len: 14)
Could not create Communicator my_world. rc = 52
[nv46:97180] *** Process received signal ***
[nv46:97180] Signal: Aborted (6)
[nv46:97180] Signal code: (-6)
--------------------------------------------------------------------------
Your application has invoked an MPI function that is not supported in
this environment.
MPI function: MPI_Comm_from_group/MPI_Intercomm_from_groups
Reason: The PML being used - ucx - does not support MPI sessions related features
--------------------------------------------------------------------------
[nv46:97180] [ 0] /usr/lib/libc.so.6(+0x3c770)[0x72422de41770]
[nv46:97180] [ 1] /usr/lib/libc.so.6(+0x8d32c)[0x72422de9232c]
[nv46:97180] [ 2] /usr/lib/libc.so.6(gsignal+0x18)[0x72422de416c8]
[nv46:97180] [ 3] /usr/lib/libc.so.6(abort+0xd7)[0x72422de294b8]
[nv46:97180] [ 4] main(+0x12f4)[0x6239e33802f4]
[nv46:97180] [ 5] main(+0x1585)[0x6239e3380585]
[nv46:97180] [ 6] /usr/lib/libc.so.6(+0x25cd0)[0x72422de2acd0]
[nv46:97180] [ 7] /usr/lib/libc.so.6(__libc_start_main+0x8a)[0x72422de2ad8a]
[nv46:97180] [ 8] main(+0x1165)[0x6239e3380165]
[nv46:97180] *** End of error message ***
--------------------------------------------------------------------------
prterun noticed that process rank 0 with PID 97180 on node nv46 exited on
signal 6 (Aborted).
Console Output Local:
$ mpirun -np 1 -mca osc ucx main
PSET 0: mpi://WORLD (len: 12)
PSET 1: mpi://SELF (len: 11)
PSET 2: mpix://SHARED (len: 14)
World Comm Sum (1): 1
Self Comm Sum (1): 1
Installation
Small Uni-Cluster
UCX Output
Output von configure-release:
[[
configure: ASAN check: no
configure: Multi-thread: disabled
configure: MPI tests: disabled
configure: VFS support: yes
configure: Devel headers: no
configure: io_demo CUDA support: no
configure: Bindings: < >
configure: UCS modules: < fuse >
configure: UCT modules: < ib rdmacm cma >
configure: CUDA modules: < >
configure: ROCM modules: < >
configure: IB modules: < >
configure: UCM modules: < >
configure: Perf modules: < >
]]
Output make install:
$UCXFOLDER/myinstall/bin/ucx_info -v
[[
# Library version: 1.17.0
# Library path: ${HOME}/itoyori/ucx/myinstall/lib/libucs.so.0
# API headers version: 1.17.0
# Git branch 'master', revision a48ad8f
# Configured with: --disable-logging --disable-debug --disable-assertions --disable-params-check --prefix=${HOME}/itoyori/ucx/myinstall --without-go
]]
OpenMPI
Output von configure:
[[
Open MPI configuration:
-----------------------
Version: 5.0.3
MPI Standard Version: 3.1
Build MPI C bindings: yes
Build MPI Fortran bindings: mpif.h, use mpi, use mpi_f08
Build MPI Java bindings (experimental): no
Build Open SHMEM support: yes
Debug build: no
Platform file: (none)
Miscellaneous
-----------------------
Atomics: GCC built-in style atomics
Fault Tolerance support: mpi
HTML docs and man pages: installing packaged docs
hwloc: external
libevent: external
Open UCC: no
pmix: external
PRRTE: external
Threading Package: pthreads
Transports
-----------------------
Cisco usNIC: no
Cray uGNI (Gemini/Aries): no
Intel Omnipath (PSM2): no (not found)
Open UCX: yes
OpenFabrics OFI Libfabric: yes (pkg-config: default search paths)
Portals4: no (not found)
Shared memory/copy in+copy out: yes
Shared memory/Linux CMA: yes
Shared memory/Linux KNEM: no
Shared memory/XPMEM: no
TCP: yes
Accelerators
-----------------------
CUDA support: no
ROCm support: no
OMPIO File Systems
-----------------------
DDN Infinite Memory Engine: no
Generic Unix FS: yes
IBM Spectrum Scale/GPFS: no (not found)
Lustre: no (not found)
PVFS2/OrangeFS: no
]]
Local
UCX Output
Output von configure-release:
configure: =========================================================
configure: UCX build configuration:
configure: Build prefix: ${HOME}/ucx/myinstall
configure: Configuration dir: ${prefix}/etc/ucx
configure: Preprocessor flags: -DCPU_FLAGS="" -I${abs_top_srcdir}/src -I${abs_top_builddir} -I${abs_top_builddir}/src
configure: C compiler: gcc -O3 -g -Wall -Werror -funwind-tables -Wno-missing-field-initializers -Wno-unused-parameter -Wno-unused-label -Wno-long-long -Wno-endif-labels -Wno-sign-compare -Wno-multichar -Wno-deprecated-declarations -Winvalid-pch -Wno-pointer-sign -Werror-implicit-function-declaration -Wno-format-zero-length -Wnested-externs -Wshadow -Werror=declaration-after-statement
configure: C++ compiler: g++ -O3 -g -Wall -Werror -funwind-tables -Wno-missing-field-initializers -Wno-unused-parameter -Wno-unused-label -Wno-long-long -Wno-endif-labels -Wno-sign-compare -Wno-multichar -Wno-deprecated-declarations -Winvalid-pch
configure: Multi-thread: disabled
configure: MPI tests: disabled
configure: VFS support: yes
configure: Devel headers: no
configure: io_demo CUDA support: no
configure: Bindings: < >
configure: UCS modules: < fuse >
configure: UCT modules: < cma >
configure: CUDA modules: < >
configure: ROCM modules: < >
configure: IB modules: < >
configure: UCM modules: < >
configure: Perf modules: < >
configure: =========================================================
Output make install:
$UCXFOLDER/myinstall/bin/ucx_info -v
# Library version: 1.16.0
# Library path: ${HOME}/ucx/myinstall/lib/libucs.so.0
# API headers version: 1.16.0
# Git branch '', revision e4bb802
# Configured with: --disable-logging --disable-debug --disable-assertions --disable-params-check --prefix=${HOME}/ucx/myinstall --without-go
OpenMPI Output
Output von configure:
Open MPI configuration:
-----------------------
Version: 5.0.3
MPI Standard Version: 3.1
Build MPI C bindings: yes
Build MPI Fortran bindings: no
Build MPI Java bindings (experimental): no
Build Open SHMEM support: yes
Debug build: no
Platform file: (none)
Miscellaneous
-----------------------
Atomics: GCC built-in style atomics
Fault Tolerance support: mpi
HTML docs and man pages: installing packaged docs
hwloc: internal
libevent: external
Open UCC: no
pmix: internal
PRRTE: internal
Threading Package: pthreads
Transports
-----------------------
Cisco usNIC: no
Cray uGNI (Gemini/Aries): no
Intel Omnipath (PSM2): no (not found)
Open UCX: yes
OpenFabrics OFI Libfabric: no (not found)
Portals4: no (not found)
Shared memory/copy in+copy out: yes
Shared memory/Linux CMA: yes
Shared memory/Linux KNEM: no
Shared memory/XPMEM: no
TCP: yes
Accelerators
-----------------------
CUDA support: no
ROCm support: no
OMPIO File Systems
-----------------------
DDN Infinite Memory Engine: no
Generic Unix FS: yes
IBM Spectrum Scale/GPFS: no (not found)
Lustre: no (not found)
PVFS2/OrangeFS: no
MPI and UCX Installation
Ordnerstruktur:
${HOME}/ucx
${HOME}/openmpi-5.0.3
Install OpenUCX
cd ${HOME}
git clone https://github.com/openucx/ucx.git
cd ucx
git checkout v1.16.0
export UCXFOLDER=${HOME}/ucx
./autogen.sh
./contrib/configure-release --prefix=$UCXFOLDER/myinstall --without-go
Install:
make -j32
make install
OpenMPI
cd ${HOME}
wget https://download.open-mpi.org/release/open-mpi/v5.0/openmpi-5.0.3.tar.gz
tar xfvz openmpi-5.0.3.tar.gz
export MPIFOLDER=${HOME}/openmpi-5.0.3
cd $MPIFOLDER
./configure --disable-io-romio --with-io-romio-flags=--without-ze --disable-sphinx --prefix="$MPIFOLDER/myinstall" --with-ucx="$UCXFOLDER/myinstall" 2>&1 | tee config.out
Install:
make -j32 all 2>&1 | tee make.out
make install 2>&1 | tee install.out
export OMPI="${MPIFOLDER}/myinstall"
export PATH=$OMPI/bin:$PATH
export LD_LIBRARY_PATH=$OMPI/lib:$LD_LIBRARY_PATH
Activity