rmed

blog

Nvidia Optimus, Clevo, and power management

2019-08-21 13:11

I've had a PC Specialist Optimus IX (Clevo N850EP6) laptop for a while, dual booting Debian testing (with kernel 4.19) and Windows 10. This laptop has a hybrid graphics system with an integrated Intel UHD 630 and a dedicated NVIDIA GTX 1060 and, while it works pretty well for the most part, there have been several issues regarding power management, freezes (looking at you lspci!) and other malfunctions.

"Luckily" I am not the only one who has had these issues, so I finally could find a good solution for my case.

Enter Optimus and Bumblebee

Optimus is a technology developed by NVIDIA which allows switching between the integrated and the dedicated GPUs as needed. In practice, you run the desktop and most common applications, which do not usually require many resources, using the integrated GPU, while other applications such as games would use the dedicated GPU for best performance.

This achieves a good balance between battery life and performance, but while this switch is automatic in Windows systems, GNU/Linux systems require some additional work.

This is where Bumblebee comes in. The Bumblebee daemon allows performing this switch by rendering graphics using the NVIDIA GPU and displaying them using the integrated GPU. In order to do so, applications need to be executed as either with primusrun or optirun as follows:

primusrun APPLICATION_PATH

This approach has worked really well for me in the past, so I expected it to work as well in this new laptop, so I followed the Debian documentation for Bumblebee and installed everything as I did last time. This was not the case.

How about some power management with that?

Bumblebee includes a power management module called bbswitch, which is in charge of disabling the discrete GPU when not in use, which in turn improves the battery life greatly. It is installed as a kernel module and last time I used it, it worked out of the box.

However, after a while I noticed that the fans were usually at top speed and that the laptop would completely freeze when performing certain actions such as suspending (which was a gamble, as it would work less than 50% of the time), shutting down, and even running lspci.

After some search, it turns out that many people had this problem. The Arch Linux wiki suggested adding the acpi_osi=!"Windows 2015" or acpi_osi=! acpi_osi="Windows 2009" to the kernel boot parameters, but neither worked out for me.

I also checked the workarounds in the Bumblebee issue referenced in the Arch wiki, but these didn't work either. Another user suggested disabling the GPU on boot and then manually enabling it when needed. Didn't work either.

At this point I was starting to think that I wouldn't be able to have a completely working Debian installation in my machine, but among all the comments in the GitHub issue I found one that looked really promising.

Solution

This message by user phkb01 was hidden by GitHub (with more than 300 others), so I couldn't find it the first time around:

It seems that using the linux power management instead of bbswitch works.

I used the patch from PR 983, configured bumblebee with PMMethod=none and AlwaysUnloadKernelDriver=true. I enabled power management for the Nvidia card with powertop.

Now, the Device stats tab of powertop reports 0% usage for the Nvidia card when not using it, and 100% when optirun is running, so I guess it works.

"Might as well try it out" - I thought.

First, I made a backup of the original Bumblebee configuration in /etc/bumblebee/bumblebee.conf:

sudo cp /etc/bumblebee/bumblebee.conf /etc/bumblebee/bumblebee.conf.bak

And then I modified the [driver-nvidia] section (I use the NVIDIA driver with Bumblebee) as follows:

[driver-nvidia]
#....
PMMethod=none # original value was bbswitch
#....
AlwaysUnloadKernelDriver=true # original value was false

The following means that Bumblebee will not try to manage the power of the discrete GPU through bbswitch, therefore the card should stay on until it is told otherwise through another mechanism.

After rebooting, nothing really seemed to have changed: fans were still running and I could still run primusrun glxgears, so the GPU was working. I checked the output of powertop and saw a power consumption of 25 W idle. So I tried the second part of phkb01's message and enabled power management for the NVIDIA GPU in powertop and then, silence. Fans turned off and power consumption went down to around 10 W.

Mark me impressed.

Turns out that bbswitch was the issue all along and the kernel is quite capable of managing the power of the GPU by itself. Just to be sure, I started executing lspci multiple times and suspended the laptop. Everything worked as expected.

When toggling the power management setting in powertop I noted that it executed the following command:

echo auto > /sys/bus/pci/devices/0000:01:00.0/power/control

This change is not permanent, and will be reset on next boot. Therefore, I wrote a simple systemd service to enable power management on boot:

# /etc/systemd/system/nvidia-pm.service
[Unit]
Description=Nvidia GPU power management enabler
After=sysvinit.target

[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/usr/bin/sh -c "echo auto > /sys/bus/pci/devices/0000:01:00.0/power/control"
ExecStop=/usr/bin/sh -c "echo on > /sys/bus/pci/devices/0000:01:00.0/power/control"

[Install]
WantedBy=sysvinit.target

Note that 0000:01:00.0 is the ID of my NVIDIA GPU, so it may change in other systems. Also, you can chech the status of the GPU by running:

cat /sys/bus/pci/devices/0000:01:00.0/power/runtime_status

This will return active when the GPU is being used and suspended otherwise.

Finally, enable and start the service:

sudo systemctl enable nvidia-pm.service
sudo systemctl start nvidia-pm.service

Bonus: HDMI output

I don't really use a second HDMI screen, so I didn't notice this until I had to present something and Debian wouldn't recognize the second screen.

The HDMI port is wired to the discrete GPU, so just plugging the cable will not make the screen show up in the settings manager. Apparently, Intel has a tool just for these cases called intel-virtual-output, and it is available in Debian package xserver-xorg-video-intel.

After installing it and running the program, I was greeted with the following message: no Virtual outputs on ":0"

After some searching, I found this Arch wiki page with the solution. Applied to Debian:

First, create /usr/share/X11/xorg.conf.d/20-intel.conf with the following content:

Section "Device"
    Identifier "intelgpu0"
    Driver "intel"
EndSection

Then, edit /etc/bumblebee/xorg.conf.nvidia (after making a backup):

Section "ServerLayout"
    Identifier  "Layout0"
    Option      "AutoAddDevices" "true"     # Bumblebee defaults to false
    Option      "AutoAddGPU" "false"
EndSection

Section "Device"
    Identifier  "DiscreteNvidia"
    Driver      "nvidia"
    VendorName  "NVIDIA Corporation"
    Option "ProbeAllGpus" "false"
    Option "NoLogo" "true"
    Option "UseEDID" "true"                 # Bumblebee defaults to false
    Option "AllowEmptyInitialConfiguration" # Added line
    #Option "UseDisplayDevice" "none"       # Commented this line
EndSection

Section "Screen"
    Identifier "Screen0"
    Device "DiscreteNVidia"
EndSection

After rebooting and running (the -f flag is used to keep it in the foreground):

primusrun intel-virtual-output -f

My settings manager was able to detect the screen correctly.