Skip to content

Python APIs not working with both MPI and offload #295

@chuckyount

Description

@chuckyount

The C++ APIs work with MPI and offload, and the Python APIs work for offload w/o MPI. But the combo of all 3 doesn't work. Is likely a bug in the SW stack; last tested with IMPI 2021.12 and oneAPI 2024.1.

%make clean; make -j -C src/kernel/ YK_CXXOPT=-O1 offload=1 mpi=1 ranks=2 py-yk-api-test
[0] MPI startup(): Number of NICs: 1
[0] MPI startup(): ===== NIC pinning on sdp7814 =====
[0] MPI startup(): Rank Pin nic
[0] MPI startup(): 0 enp1s0
Error: failure in zeMemGetAllocProperties 78000001
[0#908140:908140@sdp7814] MPI startup(): I_MPI_ROOT=/opt/intel/oneapi/mpi/2021.12
[0#908140:908140@sdp7814] MPI startup(): ONEAPI_ROOT=/opt/intel/oneapi
[0#908140:908140@sdp7814] MPI startup(): I_MPI_HYDRA_BOOTSTRAP=ssh
[0#908140:908140@sdp7814] MPI startup(): I_MPI_OFFLOAD=2
[0#908140:908140@sdp7814] MPI startup(): I_MPI_DEBUG=+5
[0#908140:908140@sdp7814] MPI startup(): I_MPI_PRINT_VERSION=1
Error: failure in zeMemGetAllocProperties 78000001
Abort(881416975) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Comm_split_type: Unknown error class, error stack:
PMPI_Comm_split_type(468)..................: MPI_Comm_split(MPI_COMM_WORLD, color=1, key=0, new_comm=0x5563a6824b5c) failed
PMPI_Comm_split_type(448)..................:
MPIR_Comm_split_type_impl(90)..............:
MPIDI_Comm_split_type(114).................:
MPIR_Comm_split_type_node_topo(262)........:
compare_info_hint(329).....................:
MPIDI_Allreduce_intra_composition_beta(788):
MPIDI_NM_mpi_allreduce(147)................:
MPIR_Allreduce_intra_auto(60)..............:
MPIR_Allreduce_intra_recursive_doubling(56):
MPIR_Localcopy(56).........................:
MPIDI_GPU_Localcopy(1135)..................:
MPIDI_GPU_ILocalcopy(1040).................: Error returned from GPU API

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions