Set up Ubuntu 20.04 server with Nvidia GPU (and spaCy)

May 15, 2020

Page content

After OS install - basical update

Upgrade softwares

sudo su
apt update
apt upgrade -y

ssh hardening

ssh user@newserver.com
mkdir .ssh
chmod 700 .ssh
#copy id_ecdsa.pub in .ssh/authorized_keys
chmod 600 .ssh/authorized_keys

In /etc/ssh/sshd_config add the following configurations.

PermitRootLogin no
PubkeyAuthentication yes
PasswordAuthentication no

After configuration, restart sshd.

sudo systemctl restart sshd

sudo su
passwd # Change root passwd

Firewall - ufw

https://ubuntu.com/engage/20.04-webinars

This comes through the constant security patching process and new features like the Ubuntu Server Live installer, iptables to nftables migration, and more resilient boot loader.

ufw is installed by default.

https://wiki.ubuntu.com/UncomplicatedFirewall

sudo ufw allow 22/tcp # Allow default ssh access
sudo ufw allow from 192.168.150.3 to any port 80 # Allow default http access from IP. We can use a network instead.
sudo ufw enable

Network

Tips - force apt to use IPv4 (not IPv6)

https://kofler.info/ipv6-fuer-apt-deaktivieren/

sudo su
echo 'Acquire::ForceIPv4 "true";' > /etc/apt/apt.conf.d/99disable-ipv6

Tips - networkd

As a default, systemctl status systemd-networkd is Running.

netplan

netplan --help
systemctl stop systemd-networkd # cant be stop 
netplan try

# After change yaml
netplan generate # make backend configuration from files from /etc/netplan/*.yml
netplay apply # config apply

Trouble shooting commands.

ip route list # See default gateway
ip a # = ip addr

Nvidia driver + CUDA

Driver

Disable nouveau

sudo bash -c "echo blacklist nouveau > /etc/modprobe.d/blacklist-nvidia-nouveau.conf"
sudo bash -c "echo options nouveau modeset=0 >> /etc/modprobe.d/blacklist-nvidia-nouveau.conf"
sudo update-initramfs -u
sudo reboot

lsmod | grep nou # check nouveau is not running

Install driver

Check devices.

sudo lshw -C display
...
# product: xxxxxx [xxxxx xxxxx]
...

Find the appropriate driver for your device. I accessed URLs as follows.

curl http://us.download.nvidia.com/XFree86/Linux-x86_64/440.82/NVIDIA-Linux-x86_64-440.82.run -o driver.run
mv driver NVIDIA-Linux-x86_64-440.82.run
chmod u+x NVIDIA-Linux-x86_64-440.82.run
sudo ./NVIDIA-Linux-x86_64-440.82.run

CUDA

$ sudo apt install nvidia-cuda-toolkit
$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:07:16_PDT_2019
Cuda compilation tools, release 10.1, V10.1.243

Reinstall Cuda

I’ll update it from 10.1 to 10.2.

sudo apt purge nvidia-cuda-toolkit
#Run NVIDIA driver installer again but it returns error. restart the server

# Switch a default GCC version
sudo apt -y install gcc-7 g++-7
sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-7 7
sudo update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-7 7

# Install from installer
curl http://developer.download.nvidia.com/compute/cuda/10.2/Prod/local_installers/cuda_10.2.89_440.33.01_linux.run -o cuda_10.2.89_440.33.01_linux.run
chmod u+x cuda_10.2.89_440.33.01_linux.run
sudo sh cuda_10.2.89_440.33.01_linux.run

(accept)
(Check in all baxoes and install)

===========
= Summary =
===========

Driver:   Installed
Toolkit:  Installed in /usr/local/cuda-10.2/
Samples:  Installed in /home/atlex/, but missing recommended libraries

Please make sure that
 -   PATH includes /usr/local/cuda-10.2/bin
 -   LD_LIBRARY_PATH includes /usr/local/cuda-10.2/lib64, or, add /usr/local/cuda-10.2/lib64 to /etc/ld.so.conf and run ldconfig as root

To uninstall the CUDA Toolkit, run cuda-uninstaller in /usr/local/cuda-10.2/bin
To uninstall the NVIDIA Driver, run nvidia-uninstall

Please see CUDA_Installation_Guide_Linux.pdf in /usr/local/cuda-10.2/doc/pdf for detailed information on setting up CUDA.
Logfile is /var/log/cuda-installer.log

Driver was downgraded automaticaly because I check all boxes.

Add in .bashrc

PATH=/usr/local/cuda-10.2/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda-10.2/lib64

Add in /etc/ld.so.conf.

include /usr/local/cuda-10.2/lib64

Faild pattern

As of 2020/05/15, the following doesn’t work. The target of the package is 18.04 (there is no 20.04), therefore it could fail. But I tried.

https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64&target_distro=Ubuntu&target_version=1804&target_type=deblocal

$ sudo apt purge nvidia-cuda-toolkit
$ wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-ubuntu1804.pin
$ sudo mv cuda-ubuntu1804.pin /etc/apt/preferences.d/cuda-repository-pin-600
$ wget http://developer.download.nvidia.com/compute/cuda/10.2/Prod/local_installers/cuda-repo-ubuntu1804-10-2-local-10.2.89-440.33.01_1.0-1_amd64.deb
$ sudo dpkg -i cuda-repo-ubuntu1804-10-2-local-10.2.89-440.33.01_1.0-1_amd64.deb
$ sudo apt-key add /var/cuda-repo-10-2-local-10.2.89-440.33.01/7fa2af80.pub
$ sudo apt-get update
$ sudo apt-get -y install cuda
Errors were encountered while processing:
 /tmp/apt-dpkg-install-1I6AA4/082-libcublas10_10.2.2.89-1_amd64.deb
E: Sub-process /usr/bin/dpkg returned an error code (1)

# The followings don't work because dependencies was broken somewhat.
$ sudo apt --fix-broken install
$ sudo dpkg --remove --force-remove-reinstreq cuda
$ sudo apt update

Run a sample CUDA code

git clone https://github.com/NVIDIA/cuda-samples.git
cd cuda-samples/Samples/simpleCUBLAS
vim Makefile
# Change the Location of the CUDA Toolkit
# CUDA_PATH ?= /usr/lib/nvidia-cuda-toolkit
mkake
./simpleCUBLAS
GPU Device 0: "Maxwell" with compute capability 5.2

simpleCUBLAS test running..
simpleCUBLAS test passed.

For NLP: CuPy and spaCy

sudo apt install liblzma-dev
pip install cupy-cuda101
pip install cupy-cuda100
pip install -U spacy[cuda102]==2.2.4
pip install --no-cache-dir -r file
# spacy.require_gpu() -> True
sudo ufw allow 8080/tcp #jupyter-notebook