This repository was archived by the owner on Nov 17, 2023. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 6.7k
This repository was archived by the owner on Nov 17, 2023. It is now read-only.
MXNet 2.0 cu112 docker undefined symbol issue #20145
Copy link
Copy link
Closed
Description
The nightly docker public.ecr.aws/w6z5f7h2/mxnet/python:nightly_gpu_cu112_py3 has a undefined symbol issue
>>> import mxnet
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.7/dist-packages/mxnet/__init__.py", line 23, in <module>
from .context import Context, current_context, cpu, gpu, cpu_pinned
File "/usr/local/lib/python3.7/dist-packages/mxnet/context.py", line 20, in <module>
from .base import _LIB
File "/usr/local/lib/python3.7/dist-packages/mxnet/base.py", line 293, in <module>
_LIB = _load_lib()
File "/usr/local/lib/python3.7/dist-packages/mxnet/base.py", line 284, in _load_lib
lib = ctypes.CDLL(lib_path[0], ctypes.RTLD_LOCAL)
File "/usr/lib/python3.7/ctypes/__init__.py", line 364, in __init__
self._handle = _dlopen(self._name, mode)
OSError: /usr/local/lib/python3.7/dist-packages/mxnet/libmxnet.so: undefined symbol: nvmlDeviceGetComputeRunningProcesses_v2
This is most likely due to that in new nvml (cu112) there is a new v2 api.
checking nvidia/cuda:11.2.0-cudnn8-devel-centos7 confirmed this:
* NVML API versioning support
*/
#define NVML_API_VERSION 11
#define NVML_API_VERSION_STR "11"
/**
* Defining NVML_NO_UNVERSIONED_FUNC_DEFS will disable "auto upgrading" of APIs.
* e.g. the user will have to call nvmlInit_v2 instead of nvmlInit. Enable this
* guard if you need to support older versions of the API
*/
#ifndef NVML_NO_UNVERSIONED_FUNC_DEFS
#define nvmlInit nvmlInit_v2
#define nvmlDeviceGetPciInfo nvmlDeviceGetPciInfo_v3
#define nvmlDeviceGetCount nvmlDeviceGetCount_v2
#define nvmlDeviceGetHandleByIndex nvmlDeviceGetHandleByIndex_v2
#define nvmlDeviceGetHandleByPciBusId nvmlDeviceGetHandleByPciBusId_v2
#define nvmlDeviceGetNvLinkRemotePciInfo nvmlDeviceGetNvLinkRemotePciInfo_v2
#define nvmlDeviceRemoveGpu nvmlDeviceRemoveGpu_v2
#define nvmlDeviceGetGridLicensableFeatures nvmlDeviceGetGridLicensableFeatures_v3
#define nvmlEventSetWait nvmlEventSetWait_v2
#define nvmlDeviceGetAttributes nvmlDeviceGetAttributes_v2
#define nvmlComputeInstanceGetInfo nvmlComputeInstanceGetInfo_v2
#define nvmlDeviceGetComputeRunningProcesses nvmlDeviceGetComputeRunningProcesses_v2
#define nvmlDeviceGetGraphicsRunningProcesses nvmlDeviceGetGraphicsRunningProcesses_v2
#endif // #ifndef NVML_NO_UNVERSIONED_FUNC_DEFS
..........
..........
nvmlReturn_t DECLDIR nvmlDeviceGetComputeRunningProcesses_v2(nvmlDevice_t device, unsigned int *infoCount, nvmlProcessInfo_t *infos);
We can probably get around this issue by defining NVML_NO_UNVERSIONED_FUNC_DEFS