Skip to content

[Open MPI main branch] mpirun/mpicc error while loading shared libraries #9907

Open
@shijin-aws

Description

@shijin-aws

Thank you for taking the time to submit an issue!

Background information

What version of Open MPI are you using? (e.g., v3.0.5, v4.0.2, git branch name and hash, etc.)

main branch, commit

Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)

git clone https://github.com/open-mpi/ompi.git
cd ompi
git submodule update --recursive --init
./autogen.pl
./configure --prefix=/home/ec2-user/ompi/install --disable-man-pages
make -j install

If you are building/installing from a git clone, please copy-n-paste the output from git submodule status.

 680331773926b62c245626dbc9cf78aed2d641d3 3rd-party/openpmix (v1.1.3-3327-g68033177)
 78825642e8594ebffda0942fa04e375077819732 3rd-party/prrte (psrvr-v2.0.0rc1-4147-g78825642e8)

Please describe the system on which you are running

  • Operating system/version: amazon linux 2
  • Computer hardware: aws ec2 instance c5n.18xlarge
  • Network type:

Details of the problem

We find build Open MPI main branch on a machine that has cuda toolkit installed in /usr/local/cuda will cause mpirun/mpicc error while loading shared libraries

[ec2-user@ip-172-31-49-61 ompi]$ /home/ec2-user/ompi/install/bin/mpirun --version
/home/ec2-user/ompi/install/bin/mpirun: error while loading shared libraries: libOpenCL.so.1: cannot open shared object file: No such file or directory

ldd shows thempirun are linked with cuda libraries like libOpenCL.so which are not found in the default /lib64/ path.

[ec2-user@ip-172-31-49-61 ~]$ ldd /home/ec2-user/ompi/install/bin/mpirun
	linux-vdso.so.1 (0x00007ffe0e1bf000)
	libopen-pal.so.0 => /home/ec2-user/ompi/install/lib/libopen-pal.so.0 (0x00007f380d0cf000)
	libnl-3.so.200 => /lib64/libnl-3.so.200 (0x00007f380ceaf000)
	libnl-route-3.so.200 => /lib64/libnl-route-3.so.200 (0x00007f380cc43000)
	libpmix.so.0 => /home/ec2-user/ompi/install/lib/libpmix.so.0 (0x00007f380c80a000)
	libevent_core-2.1.so.7 => /home/ec2-user/ompi/install/lib/libevent_core-2.1.so.7 (0x00007f380c5d6000)
	libevent_pthreads-2.1.so.7 => /home/ec2-user/ompi/install/lib/libevent_pthreads-2.1.so.7 (0x00007f380c3d3000)
	libhwloc.so.15 => /home/ec2-user/ompi/install/lib/libhwloc.so.15 (0x00007f380c17c000)
	libudev.so.1 => /lib64/libudev.so.1 (0x00007f380bf68000)
	libOpenCL.so.1 => not found
	libcudart.so.11.0 => not found
	libnvidia-ml.so.1 => not found
	libdl.so.2 => /lib64/libdl.so.2 (0x00007f380bd64000)
	libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f380bb46000)
	librt.so.1 => /lib64/librt.so.1 (0x00007f380b93e000)
	libm.so.6 => /lib64/libm.so.6 (0x00007f380b5fe000)
	libutil.so.1 => /lib64/libutil.so.1 (0x00007f380b3fb000)
	libc.so.6 => /lib64/libc.so.6 (0x00007f380b050000)
	libOpenCL.so.1 => not found
	libcudart.so.11.0 => not found
	libnvidia-ml.so.1 => not found
	libOpenCL.so.1 => not found
	libcudart.so.11.0 => not found
	libnvidia-ml.so.1 => not found
	libOpenCL.so.1 => not found
	libcudart.so.11.0 => not found
	libnvidia-ml.so.1 => not found
	libcap.so.2 => /lib64/libcap.so.2 (0x00007f380ae4b000)
	libdw.so.1 => /lib64/libdw.so.1 (0x00007f380abfa000)
	libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f380a9e4000)
	/lib64/ld-linux-x86-64.so.2 (0x00007f380d3b2000)
	libattr.so.1 => /lib64/libattr.so.1 (0x00007f380a7df000)
	libelf.so.1 => /lib64/libelf.so.1 (0x00007f380a5c7000)
	libz.so.1 => /lib64/libz.so.1 (0x00007f380a3b2000)
	liblzma.so.5 => /lib64/liblzma.so.5 (0x00007f380a18c000)
	libbz2.so.1 => /lib64/libbz2.so.1 (0x00007f3809f7c000)

These cuda libraries are actually installed in /usr/local/cuda/lib64:

[ec2-user@ip-172-31-49-61 ~]$ ls /usr/local/cuda/lib64
libOpenCL.so                  libcusolver.so              libnppim.so.11
libOpenCL.so.1                libcusolver.so.11           libnppim.so.11.1.1.269
libOpenCL.so.1.0              libcusolver.so.11.0.0.74    libnppim_static.a
libOpenCL.so.1.0.0            libcusolverMg.so            libnppist.so
....

We are able to find this issue starts from the commit 60e82dd that bumps hwloc to v2.7.

Before this bump, there is no cuda dependency introduced to open mpi executables if we do not build Open MPI with --with-cuda.

I understand this is not an intended behavior after offline talk with @bwbarrett so report the issue here.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions