Description
I need to be able to dlopen, use (hsa_init, ...., hsa_shut_down), dlclose, and dlopen the ROCR runtime or a library that statically links in the ROCR runtime. This regressed in 9b13bcd and is now broken: only one initialization of ROCR is allowed per process launch. Previously this worked.
The changes introduced in Runtime::Acquire
check that HSA has been cleaned up via a dlclose (global destructors run) and fails with OUT_OF_RESOURCES
if it ever has been. This prevents the runtime from ever being dlopen'ed (directly or indirectly when embeddeed) more than once in a process as it's relying on a global initializer to set a non-zero static value and that initializer may never be run. Since HSA has hsa_init
and hsa_shut_down
for explicit management the additional single-load logic in RuntimeCleanup
makes hsa_init
/hsa_shut_down
not behave as documented unless HSA is statically linked into a top-level executable or only ever dynamically loaded once. This is unfortunate as HSA behavior now differs whether statically linked or dynamically linked and such reinitialization hostile behavior prevents any user of ROCR from ever being reinitialized. Note that this extends beyond just using ROCR as a dynamic library: any other dynamic library that links ROCR in statically now also cannot be reinitialized. Python modules and other systems that use plugins require reinitialization.
The cause is that upon first dlopen (or process init) the global loaded_
flag is set to true as part of a global initializer:
If a process then requests an unload of the ROCR library (or any library with ROCR statically linked into it) with dlclose the global deinitializer will be called and the RuntimeCleanup cleanup_at_load_
destructor will run and reset the loaded flag to false:
Unfortunately the next dlopen (of either ROCR or any library containing it) may only maybe run the initializers again. In most modern cases the dlopen does not run the initializers again. The loaded_
flag remains false since the static bool loaded_ = true;
is not re-run and thus the next call to hsa_init
fails with OUT_OF_RESOURCES
because it is false.
hsa_init
and hsa_shut_down
are already ref counted - as are libraries managed with dlopen and dlclose - and the current implementation is causing those to conflict. The canonical approach to solving this is to rely only on zero initialization: initializers should always be zero, initialization routines should set their managed values to non-zero values upon explicit use (hsa_init from ref count 0), and they should set their values back to zero upon explicit deinitialization (hsa_shut_down to ref count 0). This allows things like the global ref count to start at 0, transition 0->1 on the first call to hsa_init, transition 1->0 on the last call to hsa_shut_down, and then regardless of whether dlclose->dlopen runs the initializers again still see a ref count of 0.
A dlopen, hsa_init, hsa_shut_down, and dlclose should leave HSA in a state where it can be reinitialized. Code that cannot be reinitialized infects any software that may be loading HSA as they themselves cannot be reinitialized and there's unfortunately no way for any software above HSA to fix this issue.
I suspect the RuntimeCleanup
was added for a reason but instead of relying on a non-zero globally initialized value it'd be good to fix any of the globals that were previously assuming non-zero initialization. The library should not be relying on one-shot global initializers and instead handle initialization and cleanup explicitly with the existing hsa_init
and hsa_shut_down
ref counting behavior. Prior to this change things seemed to work just fine and as expected (ROCR or a library linking it could be reinitialized) - I don't doubt there may have been bugs (C++ static initialization is... not fun :) but those bugs should be fixed instead of preventing reinitialization.
/cc @cfreeamd
Activity