Description
We're working with the CUDA accelerator component and tried to rebase my somewhat outdated branch to current main. I believe I found an issue with the way the CUDA component is initialized: Since ae98e04 we call cuInit
in accelerator_cuda_init
but do not set a context. Then in every call to opal_accelerator_cuda_delayed_init
henceforth (until the first call to a CUDA function by the application) we receive a NULL context from cuCtxGetCurrent
and return an error (https://github.com/open-mpi/ompi/blob/main/opal/mca/accelerator/cuda/accelerator_cuda_component.c#L146). That prevents all other accelerator-related state in OMPI from properly initializing. On this particular system, at least smcuda
(mca_btl_smcuda_accelerator_init
) and ob1 (mca_pml_ob1_accelerator_init
) do not enable accelerator support because they cannot create a stream, unless the application does call into CUDA before calling MPI_Init
(because there will be a CUDA context in that case). Is this what we want?
Interestingly, before ae98e04 we would not return an error from opal_accelerator_cuda_delayed_init
(because cuCtxGetCurrent
returned an error code) and so the accelerator support would work properly.
I believe the same behavior exists in the 5.x release branch.