Closed
Description
Thank you for taking the time to submit an issue!
Background information
What version of Open MPI are you using? (e.g., v4.1.6, v5.0.1, git branch name and hash, etc.)
5.0.7
Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)
Building from source tarball (https://download.open-mpi.org/release/open-mpi/v5.0/openmpi-5.0.7.tar.bz2)
If you are building/installing from a git clone, please copy-n-paste the output from git submodule status
.
n/a
Please describe the system on which you are running
- Operating system/version: RHEL 8.10
- Computer hardware: Intel(R) Xeon(R) Platinum 8360Y CPU @ 2.40GHz (icelake)
- Network type: Mellanox Infiniband (ConnectX-6)
Details of the problem
- GCC: 14.2.0
- hwloc: 2.11.2
- libevent: 2.1.12
- libfabrix: 2.0.0
- PMIx: 5.0.6
- UCX: 1.18.0
- UCC: 1.3.0
- PRRTE: 3.0.8
./configure --prefix=/rds/projects/2017/branfosj-rse/easybuild/EL8-ice/software/OpenMPI/5.0.7-GCC-14.2.0--build=x86_64-pc-linux-gnu --host=x86_64-pc-linux-gnu --with-cuda=/dev/shm/branfosj/build-up-EL8/OpenMPI/5.0.7/GCC-14.2.0/openmpi-5.0.7//opal/mca/cuda --with-show-load-errors=no
--enable-mpirun-prefix-by-default --enable-shared --with-hwloc=/rds/projects/2017/branfosj-rse/easybuild/EL8-ice/software/hwloc/2.11.2-GCCcore-14.2.0
--with-libevent=/rds/projects/2017/branfosj-rse/easybuild/EL8-ice/software/libevent/2.1.12-GCCcore-14.2.0 --with-ofi=/rds/projects/2017/branfosj-rse/easybuild/EL8-ice/software/libfabric/2.0.0-GCCcore-14.2.0 --with-pmix=/rds/projects/2017/branfosj-rse/easybuild/EL8-ice/software/PMIx/5.0.6-GCCcore-14.2.0 --with-ucx=/rds/projects/2017/branfosj-rse/easybuild/EL8-ice/software/UCX/1.18.0-GCCcore-14.2.0
--with-ucc=/rds/projects/2017/branfosj-rse/easybuild/EL8-ice/software/UCC/1.3.0-GCCcore-14.2.0 --with-prrte=/rds/projects/2017/branfosj-rse/easybuild/EL8-ice/software/PRRTE/3.0.8-GCCcore-14.2.0
Then the build (make -j 8
) fails with:
Making all in mca/sshmem
make[2]: Entering directory '/dev/shm/branfosj/build-up-EL8/OpenMPI/5.0.7/GCC-14.2.0/openmpi-5.0.7/oshmem/mca/sshmem'
CC base/sshmem_base_close.lo
CC base/sshmem_base_select.lo
CC base/sshmem_base_open.lo
CC base/sshmem_base_wrappers.lo
base/sshmem_base_open.c:34:39: error: initialization of ‘void *’ from ‘long unsigned int’ makes pointer from integer without a cast [-Wint-conversion]
34 | void *mca_sshmem_base_start_address = UINTPTR_MAX;
| ^~~~~~~~~~~
make[2]: *** [Makefile:1513: base/sshmem_base_open.lo] Error 1
make[2]: *** Waiting for unfinished jobs....
make[2]: Leaving directory '/dev/shm/branfosj/build-up-EL8/OpenMPI/5.0.7/GCC-14.2.0/openmpi-5.0.7/oshmem/mca/sshmem'
make[1]: *** [Makefile:1924: all-recursive] Error 1
make[1]: Leaving directory '/dev/shm/branfosj/build-up-EL8/OpenMPI/5.0.7/GCC-14.2.0/openmpi-5.0.7/oshmem'
make: *** [Makefile:1539: all-recursive] Error 1
If I reverse the change to oshmem/mca/sshmem/base/sshmem_base_open.c
from #12889 then I do not see the failure. So a patch of:
--- openmpi-5.0.7/oshmem/mca/sshmem/base/sshmem_base_open.c 2025-02-14 16:51:30.988684227 +0000
+++ openmpi-5.0.6/oshmem/mca/sshmem/base/sshmem_base_open.c 2024-11-15 14:18:09.472756350 +0000
@@ -31,7 +31,17 @@
* globals
*/
-void *mca_sshmem_base_start_address = UINTPTR_MAX;
+/**
+ * if 32 bit we set sshmem_base_start_address to 0
+ * to let OS allocate segment automatically
+ */
+#if UINTPTR_MAX == 0xFFFFFFFF
+void *mca_sshmem_base_start_address = (void*)0;
+#elif defined(__aarch64__)
+void* mca_sshmem_base_start_address = (void*)0xAB0000000000;
+#else
+void* mca_sshmem_base_start_address = (void*)0xFF000000;
+#endif
char * mca_sshmem_base_backing_file_dir = NULL;
Should this be? Or something else?
void *mca_sshmem_base_start_address = (void*)UINTPTR_MAX;