Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to find EGL device for CUDA device #1671

Closed
HanqingWangAI opened this issue Feb 19, 2022 · 5 comments
Closed

Unable to find EGL device for CUDA device #1671

HanqingWangAI opened this issue Feb 19, 2022 · 5 comments

Comments

@HanqingWangAI
Copy link

HanqingWangAI commented Feb 19, 2022

Habitat-Sim version

v0.2.1

❓ Questions and Help

When I tried to initialize a sim on a headless server with 8 NVIDIA RTX A6000, NVIDIA-SMI 510.47.03 Driver Version: 510.47.03 CUDA Version: 11.6, I got the following error.

2022-02-19 16:43:38,521 initializing sim Sim-v0                                                                                                                                     
WARNING: Logging before InitGoogleLogging() is written to STDERR                                                                                                                                                                  
I0219 16:43:38.524030 10427 ManagedFileBasedContainer.h:210] <Dataset>::convertFilenameToPassedExt : Filename : default changed to proposed scene_dataset_config.json filename : default.scene_dataset_config.json
I0219 16:43:38.524072 10427 AttributesManagerBase.h:365] <Dataset>::createFromJsonOrDefaultInternal : Proposing JSON name : default.scene_dataset_config.json from original name : default | This file  does not exist.
I0219 16:43:38.524181 10427 AssetAttributesManager.cpp:120] Asset attributes (capsule3DSolid : capsule3DSolid_hemiRings_4_cylRings_1_segments_12_halfLen_0.75_useTexCoords_false_useTangents_false) created and registered.
I0219 16:43:38.524227 10427 AssetAttributesManager.cpp:120] Asset attributes (capsule3DWireframe : capsule3DWireframe_hemiRings_8_cylRings_1_segments_16_halfLen_1) created and registered.
I0219 16:43:38.524271 10427 AssetAttributesManager.cpp:120] Asset attributes (coneSolid : coneSolid_segments_12_halfLen_1.25_rings_1_useTexCoords_false_useTangents_false_capEnd_true) created and registered.
I0219 16:43:38.524298 10427 AssetAttributesManager.cpp:120] Asset attributes (coneWireframe : coneWireframe_segments_32_halfLen_1.25) created and registered.
I0219 16:43:38.524333 10427 AssetAttributesManager.cpp:120] Asset attributes (cubeSolid : cubeSolid) created and registered.
I0219 16:43:38.524348 10427 AssetAttributesManager.cpp:120] Asset attributes (cubeWireframe : cubeWireframe) created and registered.
I0219 16:43:38.524387 10427 AssetAttributesManager.cpp:120] Asset attributes (cylinderSolid : cylinderSolid_rings_1_segments_12_halfLen_1_useTexCoords_false_useTangents_false_capEnds_true) created and registered.
I0219 16:43:38.524420 10427 AssetAttributesManager.cpp:120] Asset attributes (cylinderWireframe : cylinderWireframe_rings_1_segments_32_halfLen_1) created and registered.
I0219 16:43:38.524439 10427 AssetAttributesManager.cpp:120] Asset attributes (icosphereSolid : icosphereSolid_subdivs_1) created and registered.
I0219 16:43:38.524456 10427 AssetAttributesManager.cpp:120] Asset attributes (icosphereWireframe : icosphereWireframe_subdivs_1) created and registered.
I0219 16:43:38.524483 10427 AssetAttributesManager.cpp:120] Asset attributes (uvSphereSolid : uvSphereSolid_rings_8_segments_16_useTexCoords_false_useTangents_false) created and registered.
I0219 16:43:38.524507 10427 AssetAttributesManager.cpp:120] Asset attributes (uvSphereWireframe : uvSphereWireframe_rings_16_segments_32) created and registered.
I0219 16:43:38.524518 10427 AssetAttributesManager.cpp:108] ::constructor : Built default primitive asset templates : 12
I0219 16:43:38.525240 10427 SceneDatasetAttributesManager.cpp:36] File (default) not found, so new default dataset attributes created and registered.
I0219 16:43:38.525246 10427 MetadataMediator.cpp:127] ::createSceneDataset : Dataset default successfully created.
I0219 16:43:38.525261 10427 AttributesManagerBase.h:365] <Physics Manager>::createFromJsonOrDefaultInternal : Proposing JSON name : ./data/default.physics_config.json from original name : ./data/default.physics_config.json | T
his file  does not exist.                         
I0219 16:43:38.525285 10427 PhysicsAttributesManager.cpp:26] File (./data/default.physics_config.json) not found, so new default physics manager attributes created and registered.
I0219 16:43:38.525291 10427 MetadataMediator.cpp:212] ::setActiveSceneDatasetName : Previous active dataset  changed to default successfully.
I0219 16:43:38.525296 10427 MetadataMediator.cpp:183] ::setCurrPhysicsAttributesHandle : Old physics manager attributes  changed to ./data/default.physics_config.json successfully.
I0219 16:43:38.525303 10427 MetadataMediator.cpp:68] ::setSimulatorConfiguration : Set new simulator config for scene/stage : data/scene_datasets/mp3d/V2XKFyX4ASd/V2XKFyX4ASd.glb and dataset : default which is currently active
 dataset.                          
Platform::WindowlessEglApplication: eglQueryDeviceStringEXT(EGLDevice=0): EGL_EXT_device_drm
eglQueryDeviceAttribEXT(): eglQueryDeviceStringEXT
Platform::WindowlessEglApplication: eglQueryDeviceStringEXT(EGLDevice=1): EGL_EXT_device_drm
eglQueryDeviceAttribEXT(): eglQueryDeviceStringEXT                                        
Platform::WindowlessEglApplication: eglQueryDeviceStringEXT(EGLDevice=2): EGL_EXT_device_drm
eglQueryDeviceAttribEXT(): eglQueryDeviceStringEXT     
Platform::WindowlessEglApplication: eglQueryDeviceStringEXT(EGLDevice=3): EGL_EXT_device_drm
eglQueryDeviceAttribEXT(): eglQueryDeviceStringEXT
Platform::WindowlessEglApplication: eglQueryDeviceStringEXT(EGLDevice=4): EGL_EXT_device_drm
eglQueryDeviceAttribEXT(): eglQueryDeviceStringEXT
Platform::WindowlessEglApplication: eglQueryDeviceStringEXT(EGLDevice=5): EGL_EXT_device_drm
eglQueryDeviceAttribEXT(): eglQueryDeviceStringEXT
Platform::WindowlessEglApplication: eglQueryDeviceStringEXT(EGLDevice=6): EGL_EXT_device_drm
eglQueryDeviceAttribEXT(): eglQueryDeviceStringEXT
Platform::WindowlessEglApplication: eglQueryDeviceStringEXT(EGLDevice=7): EGL_EXT_device_drm
eglQueryDeviceAttribEXT(): eglQueryDeviceStringEXT
Platform::WindowlessEglApplication: eglQueryDeviceStringEXT(EGLDevice=8): EGL_MESA_device_software
eglQueryDeviceAttribEXT(): eglQueryDeviceStringEXT
Platform::WindowlessEglApplication::tryCreateContext(): unable to find EGL device for CUDA device 0
WindowlessContext: Unable to create windowless context

I also checked my libEGL using ldconfig -N -v | grep libEGL following #288 and got the following logs.

/sbin/ldconfig.real: Can't stat /usr/local/cuda-11/targets/x86_64-linux/lib: No such file or directory
/sbin/ldconfig.real: Path `/usr/local/cuda-11.1/targets/x86_64-linux/lib' given more than once
/sbin/ldconfig.real: Can't stat /usr/local/cuda-11.6/targets/x86_64-linux/lib: No such file or directory
/sbin/ldconfig.real: Can't stat /usr/local/lib/i386-linux-gnu: No such file or directory
/sbin/ldconfig.real: Can't stat /usr/local/lib/i686-linux-gnu: No such file or directory
/sbin/ldconfig.real: Can't stat /lib/i686-linux-gnu: No such file or directory
/sbin/ldconfig.real: Can't stat /usr/lib/i686-linux-gnu: No such file or directory
/sbin/ldconfig.real: Can't stat /usr/local/lib/x86_64-linux-gnu: No such file or directory
/sbin/ldconfig.real: Path `/lib/x86_64-linux-gnu' given more than once
/sbin/ldconfig.real: Path `/usr/lib/x86_64-linux-gnu' given more than once
/sbin/ldconfig.real: /lib/i386-linux-gnu/ld-2.27.so is the dynamic linker, ignoring

/sbin/ldconfig.real: Cannot stat /usr/lib/i386-linux-gnu/libnvidia-nvvm.so: No such file or directory
        libEGL_mesa.so.0 -> libEGL_mesa.so.0.0.0
        libEGL_nvidia.so.0 -> libEGL_nvidia.so.510.47.03
        libEGL.so.1 -> libEGL.so.1.0.0
/sbin/ldconfig.real: /lib/x86_64-linux-gnu/ld-2.27.so is the dynamic linker, ignoring

        libEGL_mesa.so.0 -> libEGL_mesa.so.0.0.0
        libEGL_nvidia.so.0 -> libEGL_nvidia.so.510.47.03
        libEGL.so.1 -> libEGL.so.1.0.0

For scripts locate libEGL_nvidia.so, I got

/usr/lib/i386-linux-gnu/libEGL_nvidia.so.0
/usr/lib/i386-linux-gnu/libEGL_nvidia.so.510.47.03
/usr/lib/x86_64-linux-gnu/libEGL_nvidia.so.0
/usr/lib/x86_64-linux-gnu/libEGL_nvidia.so.510.47.03

Since I don't have the root permission, I tried to create the symlinks by

ln -s /usr/lib/x86_64-linux-gnu/libEGL.so.1 ~/lib/libEGL.so
ln -s /usr/lib/x86_64-linux-gnu/libEGL_mesa.so.0 ~/lib/libEGL_mesa.so
ln -s /usr/lib/x86_64-linux-gnu/libEGL_nvidia.so.0 ~/lib/libEGL_nvidia.so

Then I added ~/lib, /usr/lib/i386-linux-gnu, and /usr/lib/x86_64-linux-gnu to the LD_LIBRARY_PATH and run the code again. The error still exists.

Is there any solution to this problem?

Thanks!

@erikwijmans
Copy link
Contributor

What does ldd $(python -c "import habitat_sim; print(habitat_sim._ext.habitat_sim_bindings.__file__)") print? That can sometimes show the linking error.

There are also magic files in /usr/share/glvnd/egl_vendor.d (maybe a slightly different directory depending on system config, but that's the normal one) that need to exist. There should be:

> cat /usr/share/glvnd/egl_vendor.d/10_nvidia.json
{
    "file_format_version" : "1.0.0",
    "ICD" : {
        "library_path" : "libEGL_nvidia.so.0"
    }
}

@HanqingWangAI
Copy link
Author

Hi @erikwijmans, I got the following outputs:

linux-vdso.so.1 (0x00007fffa8b9d000)
        libz.so.1 => /home/cheng443/miniconda3/envs/habitat/lib/python3.6/site-packages/habitat_sim/_ext/../../../../libz.so.1 (0x00007fa0bff0e000)
        librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007fa0be9fe000)
        libgomp.so.1 => /home/cheng443/miniconda3/envs/habitat/lib/python3.6/site-packages/habitat_sim/_ext/../../../../libgomp.so.1 (0x00007fa0bfec0000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fa0be7df000)
        libCorradeTestSuite.so.2 => /home/cheng443/miniconda3/envs/habitat/lib/python3.6/site-packages/habitat_sim/_ext/libCorradeTestSuite.so.2 (0x00007fa0be5c2000)
        libEGL.so.1 => /usr/lib/x86_64-linux-gnu/libEGL.so.1 (0x00007fa0be3ae000)
        libOpenGL.so.0 => /usr/lib/x86_64-linux-gnu/libOpenGL.so.0 (0x00007fa0be180000)
        libCorradePluginManager.so.2 => /home/cheng443/miniconda3/envs/habitat/lib/python3.6/site-packages/habitat_sim/_ext/libCorradePluginManager.so.2 (0x00007fa0bdf6c000)
        libCorradeUtility.so.2 => /home/cheng443/miniconda3/envs/habitat/lib/python3.6/site-packages/habitat_sim/_ext/libCorradeUtility.so.2 (0x00007fa0bdcfa000)
        libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fa0bdaf6000)
        libstdc++.so.6 => /home/cheng443/miniconda3/envs/habitat/lib/python3.6/site-packages/habitat_sim/_ext/../../../../libstdc++.so.6 (0x00007fa0bd94b000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fa0bd5ad000)
        libgcc_s.so.1 => /home/cheng443/miniconda3/envs/habitat/lib/python3.6/site-packages/habitat_sim/_ext/../../../../libgcc_s.so.1 (0x00007fa0bfea7000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fa0bd1bc000)
        /lib64/ld-linux-x86-64.so.2 (0x00007fa0bfd01000)
        libGLdispatch.so.0 => /usr/lib/x86_64-linux-gnu/libGLdispatch.so.0 (0x00007fa0bcf06000)

I only find 50_mesa.json under the /usr/share/glvnd/egl_vendor.d path, and the file is

{
    "file_format_version" : "1.0.0",
    "ICD" : {
        "library_path" : "libEGL_mesa.so.0"
    }
}

@erikwijmans
Copy link
Contributor

You'll need that /usr/share/glvnd/egl_vendor.d/10_nvidia.json for things to work (and that is unfortunately the only location that file can be in). Please contact your system administrator as that should have been added as part of a standard Nvidia driver install.

@HanqingWangAI
Copy link
Author

Thank you! I will send a request to the admin.

@francescotaioli
Copy link

For anyone having this issue, I solved it by installing the correct nvidia-driver.
Let me explain.

Before I was using the headless version, such as nvidia-headless-525 and I didn't have this file: /usr/share/glvnd/egl_vendor.d/10_nvidia.json

After purging and installing the correct nvidia-version, such as nvidia-driver-525, I fixed the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants