Setup Nvidia Accelerated Graphics

Background

Accessing the GUI images (through the OpenStack Console, VNC, or XRDP) will use the default virtual GPU (i.e. not the Nvidia one) which uses the CPU instead.

Symptoms include accelerated applications failing to launch, responding slowly or processing at a similar speed to a CPU-only instance or local machines

Technical notes provide additional technical background, and can be omitted from reading

Requirements

Access to flavors with GPU ( VM Flavors ), Contact Us if you need access
A VM with the Ubuntu GUI and a GPU
SSH Access ( https://stfc.atlassian.net/wiki/spaces/SC/pages/329384108 )

Driver Install

SSH into your VM
Check the latest version for the Nvidia driver:

apt-cache search nvidia-driver

# nvidia-driver-100 - Transitional package for nvidia-driver-100
# nvidia-driver-110 - Transitional package for nvidia-driver-110

You’ll want to select the latest version that does not include extra words, e.g. open, server, headless …etc.

sudo apt update 
# Replace XXX with the latest version
sudo apt install nvidia-cuda-toolkit nvidia-driver-XXX

Required Packages

LightDM is required, as GDM splits the session across two X sessions for the login and user session

Install a compatible greeter

sudo apt install xfce4 xfce4-goodies
# Select lightDM instead of gdm when prompted

Reboot the machine
Follow the steps here to allow the GPU to run without a physical monitor: VirtualGL HeadlessNV
Install the GPG key and repo for TurboVNC as follows:

wget -q -O- https://packagecloud.io/dcommander/turbovnc/gpgkey | \
  gpg --dearmor | sudo dd of=/etc/apt/trusted.gpg.d/TurboVNC.gpg


wget -q -O- https://raw.githubusercontent.com/TurboVNC/repo/main/TurboVNC.list | \
  sudo tee /etc/apt/sources.list.d/turboVNC.list
  
sudo apt update
sudo apt install turbovnc

Install Virtual GL:

cd /tmp
wget https://github.com/VirtualGL/virtualgl/releases/download/3.1.1/virtualgl_3.1.1_amd64.deb

sudo dpkg --install virtualgl_3.1.1_amd64.deb

Configure VirtualGL and TurboVNC

TurboVNC is used to still provide a virtual target. X11vnc could be used, however gnome-shell starts on display :1. A user would need to start a second VNC connection after logging into the greeter.

Configure VirtualGL to provide OpenGL and EGX using the Nvidia card:

cd /opt/VirtualGL/bin/
sudo ./vglserver_config -config +s +f 
mkdir ~/.vnc
cp /opt/TurboVNC/bin/xstartup.turbovnc ~/.vnc/xstartup.turbovnc
chmod +x ~/.vnc/xstartup.turbovnc

Reboot the machine to ensure the display stack can use VirtualGL

Connecting to the server

# SSH in as the user you want to run as
# Note: this is not run with root, else the user will login as root

cd /opt/TurboVNC/bin/

# The first run will prompt for a VNC password
./vncserver -vgl

Your session will last as long as the SSH session is connected
- (In the future we will provide systemd config)
Open TCP port 5901 using the Create and Delete Security Group
Download a VNC client to locally:
- Recommended Linux: Remmina
- Recommended Windows: Turbo VNC

Windows: TurboVNC

Enter the IP and port in the following form, noting the double colon: 172.16.x.y::5901
Go to Options → Connection
Remove the Username if entered
Connect and enter your VNC password

Validating the config

Install mesa-utils and check the glxinfo output using your VNC session

sudo apt install mesa-utils
glxinfo | grep -i opengl

Check the OpenGL renderer string it should show the GPU
If it says llvmpipe or vmware you are using the emulated GPU
- Try rebooting your machine and restarting the vncserver
- Alternatively Contact Us

Enable on restart

By default, TurboVNC only runs when the user SSH’s into the machine and runs the above command. We can add a service to start it whenever the machine restarts:

Open the following file with an editor:

sudo nano /etc/systemd/system/TurboVNC.service

Add the following, changing the user to your username:

[Unit]
Description=TurboVNC vncserver
After=network.target syslog

[Service]
User=<username>
Type=forking
ExecStartPre=/bin/sh -c '/opt/TurboVNC/bin/vncserver -kill :1 &>/dev/null || :'
ExecStart=/opt/TurboVNC/bin/vncserver -nevershared -disconnect -vgl :1
ExecStop=/bin/sh -c '/opt/TurboVNC/bin/vncserver -kill :1 &>/dev/null || :'
Restart=always
RestartSec=3

[Install]
WantedBy=multi-user.target

Save the file and exit the editor
Reload and start the service as follows:

sudo systemctl daemon-reload
sudo systemctl stop TurboVNC
sudo systemctl enable TurboVNC
sudo systemctl start TurboVNC

Restart the machine to validate the session comes back

Reset VNC password

SSH into the machine
Stop TurboVNC:

cd /opt/TurboVNC/bin/
./vncserver -kill :1

rm ~/.vnc/passwd
Re-run Setup Nvidia Accelerated Graphics | Configure VirtualGL and TurboVNC
- This will re-prompt for the password

Debugging

Check the Xorg log:

less /var/log/Xorg.0.log

It generally shows errors when the xorg.conf file is incorrect, or the nvidia module cannot be activated.

Turbovnc will tell you where its log file is when you run it, something like this:

/home/USER/.vnc/HOSTNAME:1.log

You can see which processes are using the graphics card with nvidia's command:

nvidia-smi

The -a flag will give lots more information, and the -l flag will print out the information periodically so you can track it, like htop but for the graphics card.

Misc places to look if you have a problem:

xrandr -q
nvidia-smi
less /var/log/Xorg.0.log
sudo journalctl -f
less /home/$USER/.xsession-errors (this is if USER is logged in, otherwise greeter logs)
less /var/log/lightdm/x-0-greeter.log

CloudKB