Using discrete and intergated GPUs in a laptop at the same time

17 Jun 2018

It’s been a while since my last post (a lot of stuff going on at work), but as I have just upgraded to Ubuntu 18.04 and had to reinstall some stuff - I decided to share a trick I find useful.

I have a laptop with 2 GPUs - discrete (GeForce GTX 950M) and integrated (Intel HD Graphics 520). Even though discrete one is not the most powerful, it’s still sometimes reasonable to train small neural networks using it. But limited memory (2 GB) quickly becomes a bottleneck, especially given that without any training 25-50% of it is already taken by gnome/xorg/etc. There is another GPU which is typically unused at all, perhaps we can use both of them at the same time? This way smaller one will be responsible for rendering UI, and the other will be completely dedicated to compute.

Disclaimer: I take no responsibility for the consequences of performing steps I describe below. Your computer might turn into pumpkin, your data might get lost, and you might even have to reinstall NVIDIA drivers afterwards. Try it out only if you’re feeling adventurous.

Turns out it’s not a very common setup - most search results for “ubuntu dual gpu” query are about setting up 2 discrete GPUs, which is not our intention. Perhaps this use case is too narrow, but anyway I was curious enough to find a working solution, so I want to duplicate it here to make it easier to find, and to prevent losing it if the original page will be removed. To make this post less redundant, I will focus on my experience with it, including quirks and workarounds. This is especially important on Ubuntu 18.04, because solution described above seems to no longer work.

Differences start from the location of libraries. Before, they used to be placed in /usr/local/cuda/lib64 and /usr/lib/nvidia-xxx (where xxx stands for driver number) for CUDA and driver libraries, respectively. But now there doesn’t seem to be nvidia-xxx directory created under /usr/lib for each version of the driver. Let’s try to find out where they are now (I omitted most of the output for brevity):

alexey@laptop:~$ ldconfig -p | grep nvidia
	...
	libnvidia-opencl.so.1 (libc6,x86-64) => /usr/lib/x86_64-linux-gnu/libnvidia-opencl.so.1
	libnvidia-opencl.so.1 (libc6) => /usr/lib/i386-linux-gnu/libnvidia-opencl.so.1
	libnvidia-ml.so.1 (libc6,x86-64) => /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1
	libnvidia-ml.so.1 (libc6) => /usr/lib/i386-linux-gnu/libnvidia-ml.so.1
	libnvidia-ml.so (libc6,x86-64) => /usr/lib/x86_64-linux-gnu/libnvidia-ml.so
	libnvidia-ml.so (libc6) => /usr/lib/i386-linux-gnu/libnvidia-ml.so
	...

Interesting, it seems that most of the stuff goes to /usr/lib/x86_64-linux-gnu/ and /usr/lib/i386-linux-gnu/ now. What about CUDA libs?

alexey@laptop:~$ ldconfig -p | grep cuda
	...
	libcuda.so (libc6,x86-64) => /usr/lib/x86_64-linux-gnu/libcuda.so
	libcuda.so (libc6) => /usr/lib/i386-linux-gnu/libcuda.so

Same here. It doesn’t seem like we need to do anything special to make libraries on those directories discoverable, but lets try to follow the steps above anyway:

switch PRIME profile to “Power saving mode” in nvidia-settings
and add those paths to LD_LIBRARY_PATH (via .bashrc for instance)
reboot
try to call nvidia-smi.

alexey@laptop:~$ nvidia-smi
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

That sucks. That used to work before, what’s going on?

It took me a while to find out, but it seems that now PRIME profile switching (or at least some part of it) is done via blacklisting the NVIDIA driver. That’s how /etc/modprobe.d/blacklist-nvidia.conf looks on my machine:

# Do not modify
# This file was generated by nvidia-prime
blacklist nvidia
blacklist nvidia-drm
blacklist nvidia-modeset
alias nvidia off
alias nvidia-drm off
alias nvidia-modeset off

After checking out “Chapter 5. Listing of Installed Components” of NVIDIA 396.24 driver documentation, it seems that nvidia-modeset is responsible for for programming the display engine of the GPU and nvidia-drm handles DRM in some way. This means we don’t really need to turn them on, as opposed to the nvidia, which sound pretty critical to us :)

Let’s try commenting out lines that blacklist nvidia and rebooting.

alexey@laptop:~$ nvidia-smi
Sun Jun 17 17:02:44 2018       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 396.24.02              Driver Version: 396.24.02                 |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 950M    Off  | 00000000:01:00.0 Off |                  N/A |
| N/A   57C    P0    N/A /  N/A |      0MiB /  2004MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Wow, now we’re talking! nvidia-smi is functioning properly and no memory is spent on rendering - that’s exactly what we need. Most importantly, training networks also works (at least with PyTorch). We’ll also achieve same results if we don’t add paths with NVIDIA and CUDA libs to LD_LIBRARY_PATH.

Following the original guide, I tried to run glmark2, and it seems to work properly without any additional steps.

A small downside is:

alexey@laptop:~$ nvidia-settings

ERROR: Unable to load info from any available system

Unfortunately I haven’t found a workaround for this yet, but on the other hand I don’t need to switch PRIME profiles often, and if I need - I can always unblacklist remaining components in /etc/modprobe.d/blacklist-nvidia.conf, reboot and have nvidia-settings working (if you plan to switch PRIME profile this will need 2 reboots vs one though).

Another nice thing is that now after you wake laptop after suspending, you’ll no longer have a black screen - I’m quite surprised that this bug is still not fixed. But there’s no free lunch - nvidia-smi still won’t work after suspend/wake cycle, so you’ll have to reboot anyway if you plan to use GPU.

That’s it I guess. Hope it helped anyone, and here’s a TL;DR section (bottom of a post is a perfect place for it) to wrap it up:

Select NVIDIA driver in “Software & Updates”/”Additional Drivers”
Make sure that nvidia-smi works and produces meaningful output. You can also check that correct GPU is detected by OS
Run “NVIDIA X Server Settings”, in “PRIME Profiles” section select “Intel (Power Saving Mode)”, reboot
Verify that now the system detects integrated GPU. nvidia-smi shouldn’t work at this point
Make sure that under /etc/modprobe.d/ you have blacklist-nvidia.conf or something similar
Comment out (place a # in the beggining of the line) blacklist nvidia and alias nvidia off in this file and save it
Reboot, verify that nvidia-smi is working and produces meaningful output (Correct GPU and driver version displayed, No running processes found shown under Processes)

I’ll be using this setup from now on, and in case I’ll discover something new I’ll update this post.

Twitter Facebook Google+