Skip to content

Conversation

@matteosecli
Copy link
Contributor

Fixes errors when dlopen-ing MPI-dependent libraries like e.g.

ERROR: InitError: could not load library "libnvpl_blacs_ilp64_openmpi3.so"
libnvpl_blacs_ilp64_openmpi3.so: undefined symbol: ompi_mpi_comm_world

The correct MPI library based on the desired implementation (MPICH or OpenMPI{3,4,5}) should be loaded before dlopen-ing NVPL's BLACS or SCALAPACK by a user-facing package (so far, there are none).

@matteosecli matteosecli marked this pull request as ready for review November 7, 2025 09:06
@ViralBShah
Copy link
Member

cc @eschnett

@eschnett
Copy link
Contributor

I do not understand the setup, or the problem that this would solve.

MPI libraries, like any libraries, are declared to be dependencies of a package. This will then guarantee that they are loaded before the package.

It seems that this package (NVPL) is special, in the sense that that it doesn't build any libraries. Instead, it downloads and re-packages pre-compiled binaries.

To solve this problem properly I would make NVPL depend on MPI. I can give this a try.

If so, all NVPL libraries will need MPI to be used. I assume that this would be an inconvenience for the libraries that do not depend on MPI? If so, I suggest splitting this package into two: The current NVPL which does not provide any MPI-related libraries (no blacs), and a second library (called e.g. NVPL_blacs) which does depend on MPI. That second library would come in two flavours, MPICH and OpenMPI.

As a side note, we do not support OpenMPI 3. OpenMPI 4 and OpenMPI 5 are compatible so we can choose either.

@matteosecli
Copy link
Contributor Author

Hi @eschnett, thanks for your inputs.

NVPL is a collection of libraries for NVIDIA's HPC platorm, much like MKL for Intel. It comprises BLAS/LAPACK, BLACS/SCALAPACK, FFTW, RAND, SPARSE, and TENSOR.

You're right that adding MPI as a dependence just for BLACS/SCALAPACK would be an inconvenience for all the other libraries which do not have such a dependence.

For example, I've put together a small wrapper that loads NVPL BLAS/LAPACK on top of OpenBLAS, and it has nothing to do with MPI: https://github.com/matteosecli/NVPL.jl

Currently, however, using NVPL_jll instead of a local installation causes errors due to all the libraries being opened at precompile/load time, even though it could be we never use them; the ones dependent on MPI indeed fail.

Adding dont_dlopen=true prevents the libraries from being opened at precompile/load time, while still leaving the possibility of explicitly opening them at a later stage once the correct MPI implementation is selected. There are some JLLs here that don't open a single library, e.g. OpenFOAM, even though they handle MPI.

As for splitting the package, I'd prefer to avoid it. We had a discussion about this over at #11233, and concluded it would be a maintenance nightmare as these libraries are all intertwined.

At the same time, I'm also bothered by the fact that MPI is needed for some of the libraries, but one would have to know in advance in order to properly load it.

Would it be reasonable to explicitly handle MPI as you suggest, while still avoiding to dlopen anything MPI-related unless explicitly requested by the user?

Please let me know if I can help in any way with this, even just for testing (I've got access to a NVHPC platform).

@eschnett
Copy link
Contributor

Making it easy for the package maintainers is one thing. We also need to ensure that these packages are usable, and they should be usable in the "standard" way. All MPI-using packages can just be loaded (with using), and they automatically load all their dependencies, including MPI.

What you suggest is different – none of the dependencies would be loaded, which leads to a segfault at run time when the package is called. It's even worse when accidentally the wrong version of dependencies are loaded. This is brittle, and can cause headaches for many people, including people who just want to try these packages without much knowledge in Unix dynamic library management.

The cost for the maintainers is not very large. It would be maintaining two packages instead of one. And when doing so, everything works "as expected" for everyone.

As first step I would remove the MPI-using libraries from this package. This would solve the immediate problem.

@imciner2
Copy link
Member

I think we can revisit the decision in #11233 (PS I just noticed how nice of an PR number this is 😁) about splitting the package. Originally, the main driver for splitting was just library size, and so we decided that didn't outweigh complexity.

Now, though, we know there are issues with certain libraries needing additional (possibly heavy/complicated) dependencies and loading for them. IMO, this is a case where it makes sense to separate out the parts needing the additional dependencies into a separate package so that users don't need to worry about them unless they actually want them, and the wrapper packages can then properly initialize the libraries.

So, my feeling is we should split this into 2 packages:

  1. NVPL_jll - The main package with everything in it other than the MPI-based libraries
  2. NVPL_MPI_jll (or a different name - the one proposed here is similar to other packages where we add the additional dependency at the end of the name, like CUDA, MKL, GPU, etc.) - This would be the BLACS and SCALAPACK libraries, and have a dependency on the main NVPL_jll package to ensure everything is correct.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants