Description
Periodically, we have users who accidentally have stale Open MPI components in their installation tree (e.g., they install a new version of Open MPI over an old version, and some components from the old version aren't overwritten -- perhaps because they don't exist in the new version).
#8102 is one example where this may be happening (at the time of this writing, #8102 (comment) is the last comment, so we don't know for definitively sure if this is happening or not).
It may be useful to put some kind of signature in component DSOs that would allow Open MPI to detect and safely ignore stale Open MPI components.
We already check MCA base major and minor version when loading DSO components, but as @bosilca points out, we don't change those versions very often. I.e., the same major/minor MCA base version numbers span multiple Open MPI release series.
We could embed some kind of "signature" in the DSOs that could be validated by the MCA base -- not just the MCA base major/minor, but also something that either changes with every version of Open MPI, and potentially even changes with different ./configure
options (although that may be a slipperly slope...?).
Crudely, you could imagine embedding a SHA256 has of some combination of
- the Open MPI version
- the
./configure
command line (see below) - ...?
That would give a repeatable hash value (e.g., if I configure/build the same Open MPI tarball with the same options, I get the same SHA256 hash).
NOTE: hashing the ./configure
command line is not necessarily as straightforward as it sounds, and may not be worth it. For example, ./configure --prefix=/blah --disable-dlopen
would hash differently than ./configure --disable-dlopen --prefix=/blah
. But even if we sort the ./configure
command line options for a deterministic ordering, some ./configure
options really have nothing to do with whether a DSO component is stale or incompatible or not. It may be simplest -- perhaps to start? -- to just also compare the Open MPI major (and minor?) versions. A signature may be a bit more than is really needed -- I recorded the idea here on the PR just for completeness.