Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot access secondary GPU - error: Could not load GPU driver #604

Open
so-rose opened this issue Aug 17, 2014 · 10 comments
Open

Cannot access secondary GPU - error: Could not load GPU driver #604

so-rose opened this issue Aug 17, 2014 · 10 comments

Comments

@so-rose
Copy link

so-rose commented Aug 17, 2014

The error above happens when I try to optirun something, glxgears or otherwise. I work with Blender 3D, making not being able to access my GPU extremely frustrating.

When I run optirun -vvv glxgears, I get this (just the important bit):

...
[  860.372651] [DEBUG] Primus LD Path: /usr/lib/x86_64-linux-gnu/primus:/usr/lib/i386-linux-gnu/primus:/usr/lib/primus:/usr/lib32/primus
[  860.372718] [DEBUG]Using auto-detected bridge primus
[  860.375964] [INFO]Response: No - error: Could not load GPU driver

[  860.376023] [ERROR]Cannot access secondary GPU - error: Could not load GPU driver

[  860.376056] [DEBUG]Socket closed.
...

My bumblebee.conf has Driver=nvidia and KernelDriver=nvidia-current (which I determined by running find /lib/modules/$(uname -r) -name 'nvidia*.ko*' as per the troubleshooting instructions in the wiki).

Here are my system specs:

uname -a output: Linux iceraven 3.14-2-amd64 #1 SMP Debian 3.14.15-2 (2014-08-09) x86_64 GNU/Linux
GPU: Nvidia 780M
Driver: Components of the nvidia-driver metapackage (recently updated to nvidia driver version 340.something) in the jessie repo.

Thank you in advance!

@so-rose so-rose changed the title Cannot access Secondary GPU; Could not load GPU driver Cannot access secondary GPU - error: Could not load GPU driver Aug 17, 2014
@ryanvade
Copy link

What are the LibraryPath and XorgModulePath set to in bumblebee.conf ? Can you manually load nvidia-current with modprobe?

@so-rose
Copy link
Author

so-rose commented Aug 17, 2014

Thanks for the quick reply!

Before I go further, I'll assert that the nvidia driver and bumblebee are installed via dkms. Here are the stats from my bumblebee.conf:

LibraryPath=/usr/lib/x86_64-linux-gnu/nvidia:/usr/lib/i386-linux-gnu/nvidia:/usr/lib/nvidia
XorgModulePath=/usr/lib/nvidia,/usr/lib/xorg/modules

I was able to load nvidia-current using modprobe nvidia-current nvidia-uvm, so that the output of lsmod | grep nvidia included nvidia and nvidia-uvm (more on this one later). At that point, optirun glxgears worked.

That, however, is when I remembered a certain issue I've had for many months, where Blender would be unable to use CUDA, stating that it was unable to load nvidia-uvm - the very same module that was next to nvidia-current in the output of find /lib/modules/$(uname -r) -name 'nvidia*.ko* - the full path to the referenced directory, on my machine, is /lib/modules/3.14-2-amd64/updates/dkms

When I finally found a way to brute force nvidia-uvm to load (modprobe was not working), I could not turn off the GPU any more with echo OFF > /proc/acpi/bbswitch. Sure enough, that function was blocked as soon as I executed modprobe nvidia-current nvidia-uvm - only after I ran modprobe in opposite order and with -r was I able to turn off the gpu with echo OFF > /proc/acpi/bbswitch.

Bumblebee should have been able to find the modules in /lib/modules/3.14-2-amd64/updates/dkms - this smells like a bug. In lieu of that, is there a setting I can alter to make bumblebee look for the modules there?

@Lekensteyn
Copy link
Member

Does modinfo nvidia-current and modinfo nvidia-uvm list the modules? If not, try to run sudo depmod -a (which should already have been done by DKMS).

For issues around unloading nvidia-uvm, see 1ada79f

@so-rose
Copy link
Author

so-rose commented Aug 21, 2014

Sorry for the late reply.
I tried the commands; they do seem to list the modules correctly:

modinfo nvidia-current

filename:       /lib/modules/3.14-2-amd64/updates/dkms/nvidia-current.ko
alias:          char-major-195-*
version:        340.24
supported:      external
license:        NVIDIA
alias:          pci:v000010DEd00000E00sv*sd*bc04sc80i00*
alias:          pci:v000010DEd00000AA3sv*sd*bc0Bsc40i00*
alias:          pci:v000010DEd*sv*sd*bc03sc02i00*
alias:          pci:v000010DEd*sv*sd*bc03sc00i00*
depends:        i2c-core
vermagic:       3.14-2-amd64 SMP mod_unload modversions 
parm:           NVreg_Mobile:int
parm:           NVreg_ResmanDebugLevel:int
parm:           NVreg_RmLogonRC:int
parm:           NVreg_ModifyDeviceFiles:int
parm:           NVreg_DeviceFileUID:int
parm:           NVreg_DeviceFileGID:int
parm:           NVreg_DeviceFileMode:int
parm:           NVreg_RemapLimit:int
parm:           NVreg_UpdateMemoryTypes:int
parm:           NVreg_InitializeSystemMemoryAllocations:int
parm:           NVreg_RMEdgeIntrCheck:int
parm:           NVreg_UsePageAttributeTable:int
parm:           NVreg_MapRegistersEarly:int
parm:           NVreg_RegisterForACPIEvents:int
parm:           NVreg_CheckPCIConfigSpace:int
parm:           NVreg_EnablePCIeGen3:int
parm:           NVreg_EnableMSI:int
parm:           NVreg_MemoryPoolSize:int
parm:           NVreg_RegistryDwords:charp
parm:           NVreg_RmMsg:charp
parm:           NVreg_AssignGpus:charp

modinfo nvidia-uvm

filename:       /lib/modules/3.14-2-amd64/updates/dkms/nvidia-uvm.ko
supported:      external
license:        MIT
depends:        nvidia
vermagic:       3.14-2-amd64 SMP mod_unload modversions

Just to be certain, I ran sudo depmod -a anyway. To be clear, I have had no trouble loading or unloading nvidia_uvm or nvidia-current - bumblebee simply does not seem to do it automatically when starting a program under optirun or quitting a program running under optirun.

@ArchangeGabriel
Copy link
Member

As per #623, the problem might be that nvidia need nvidia-uvm but is unable to load it. Need to look at this.

@aurelienpierre
Copy link

I absolutely needed OpenCL to work with Darktable so I created a quick fix :

  1. create a script called for example opencl-program.sh like this :
#/bin/sh
gksudo modprobe nvidia_352_uvm && # Because I run nvidia-352 but replace with your correct driver version and module

darktable

exit
  1. then create a menu launcher were you call the script with the command :
optirun sh opencl-program.sh

It's not optimal but it works. Notice you have to load nvidia_***_uvm through optirun command and in the same console than your program call to make it work. Be careful to be sure that the script ended properly and the "exit" command was reached to close the Bumblebee process after your program, or kill it the hard way if something went wrong in your script. Otherwise, if you load another program with optirun, your system will crash.

As soon as the script is finished or the console closed, the uvm module is unloaded so you will have to redo the modprobe command if you whish to load another program with the GPU. You can check the loaded modules with :

optirun lsmod

Notice that the latest Nvidia drivers seem to name the uvm module with underscore _ instead of dash -. You can check your actual modules names with the command :

ls /lib/modules/$(uname -r)/updates/dkms/

@ArchangeGabriel ArchangeGabriel self-assigned this Dec 27, 2015
@ArchangeGabriel
Copy link
Member

Well this means will also need to take care of loading nvidia_uvm (and possible nvidia_modeset) if they don’t on they own.

@so-rose
Copy link
Author

so-rose commented Dec 28, 2015

I'm just going to post my recent troubles, in case anyone might benefit:

It seems that the nvidia_uvm kernel module has had its name changed to nvidia_current_uvm, at least with regards to the Debian packages nvidia-cuda-toolkit (6.5) and nvidia-driver (>342.something).

The error I was getting from dmesg was something along the lines of Mismatched Drivers; some driver modules (CUDA-based) were at version 353, while others (normal) were at version 343. Seeing as CUDA worked fine without this upgrade, it's plausible that bumblebee started to get mad at me for this mismatch.

My solution was to upgrade my nvidia-driver package to the experimental release using apt pinning, making everything 353. In return, CUDA once again worked flawlessly.

In terms of OpenCL, the solution might, just maybe, be similar on Debian-based distributions (from my limited understanding, OpenCL seems to load the same way as CUDA) - though this is a completely untested claim 😃

Cheers!

Specs for Reference:

Debian 8 jessie
Linux 4.3
Nvidia 780M

@zlondrej
Copy link

zlondrej commented May 6, 2016

Here are my proposed solutions as promised in #762:
I can think of at least 2 solutions, neither one is perfect:

  • Use softdep in modprobe.d:
    • softdep nvidia post: nvidia-uvm nvidia-modeset and maybe nvidia-drm
    • Note that name of nvidia module have to be name of module file. This is problematic on some distros.
    • So for example in my case it would be softdep nvidia_361 post: nvidia-uvm nvidia-modeset
    • Following aliases are set from driver package:
    • alias nvidia nvidia_361
    • alias nvidia-uvm nvidia_361-uvm
    • alias nvidia-modeset nvidia_361-modeset
  • Use udev rules to load modules.
Partial content of /lib/udev/rules.d/71-nvidia.rules
# Load and unload nvidia-modeset module
ACTION=="add" DEVPATH=="/module/nvidia" SUBSYSTEM=="module" RUN+="/sbin/modprobe nvidia-modeset"
ACTION=="remove" DEVPATH=="/module/nvidia" SUBSYSTEM=="module" RUN+="/sbin/modprobe -r nvidia-modeset"

# Load and unload nvidia-uvm module
ACTION=="add" DEVPATH=="/module/nvidia" SUBSYSTEM=="module" RUN+="/sbin/modprobe nvidia-uvm"
ACTION=="remove" DEVPATH=="/module/nvidia" SUBSYSTEM=="module" RUN+="/sbin/modprobe -r nvidia-uvm"

In both cases, there is dependency on driver's version (different modules). For this reason it would be better if this was handled by driver package and not by bumblebee.

@ArchangeGabriel
Copy link
Member

OK, so that’s a packaging issue. I suppose that everyone should try to get this accepted in its distro nvidia package if facing this. Changing to Documentation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants