-
Notifications
You must be signed in to change notification settings - Fork 30
Description
The C++ APIs work with MPI and offload, and the Python APIs work for offload w/o MPI. But the combo of all 3 doesn't work. Is likely a bug in the SW stack; last tested with IMPI 2021.12 and oneAPI 2024.1.
%make clean; make -j -C src/kernel/ YK_CXXOPT=-O1 offload=1 mpi=1 ranks=2 py-yk-api-test
[0] MPI startup(): Number of NICs: 1
[0] MPI startup(): ===== NIC pinning on sdp7814 =====
[0] MPI startup(): Rank Pin nic
[0] MPI startup(): 0 enp1s0
Error: failure in zeMemGetAllocProperties 78000001
[0#908140:908140@sdp7814] MPI startup(): I_MPI_ROOT=/opt/intel/oneapi/mpi/2021.12
[0#908140:908140@sdp7814] MPI startup(): ONEAPI_ROOT=/opt/intel/oneapi
[0#908140:908140@sdp7814] MPI startup(): I_MPI_HYDRA_BOOTSTRAP=ssh
[0#908140:908140@sdp7814] MPI startup(): I_MPI_OFFLOAD=2
[0#908140:908140@sdp7814] MPI startup(): I_MPI_DEBUG=+5
[0#908140:908140@sdp7814] MPI startup(): I_MPI_PRINT_VERSION=1
Error: failure in zeMemGetAllocProperties 78000001
Abort(881416975) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Comm_split_type: Unknown error class, error stack:
PMPI_Comm_split_type(468)..................: MPI_Comm_split(MPI_COMM_WORLD, color=1, key=0, new_comm=0x5563a6824b5c) failed
PMPI_Comm_split_type(448)..................:
MPIR_Comm_split_type_impl(90)..............:
MPIDI_Comm_split_type(114).................:
MPIR_Comm_split_type_node_topo(262)........:
compare_info_hint(329).....................:
MPIDI_Allreduce_intra_composition_beta(788):
MPIDI_NM_mpi_allreduce(147)................:
MPIR_Allreduce_intra_auto(60)..............:
MPIR_Allreduce_intra_recursive_doubling(56):
MPIR_Localcopy(56).........................:
MPIDI_GPU_Localcopy(1135)..................:
MPIDI_GPU_ILocalcopy(1040).................: Error returned from GPU API