Nvidia Drivers: x2.xlarge

Systems Requirements:

  • CUDA-capable GPU
  • Supported version of Linux with gcc compiler and toolchain
  • NVIDIA CUDA Toolkit

Pre-installation Actions

  1. Verify the system has a CUDA-capable GPU.

    ~# lspci | grep -i nvidia
    d8:00.0 3D controller: NVIDIA Corporation GP104GL [Tesla P4] (rev a1)

    To install lspci, run:

    ~# apt-get install pciutils

    If you do not see any settings, update the PCI hardware database that Linux maintains by entering update-pciids (generally found in /sbin or /usr/sbin) at the command line and rerun the previous lspci command.

  2. Verify the system is running a supported version of Linux.

    ~# uname -m && cat /etc/*release
    DISTRIB_DESCRIPTION="Ubuntu 16.04.5 LTS"
    VERSION="16.04.5 LTS (Xenial Xerus)"
    PRETTY_NAME="Ubuntu 16.04.5 LTS"
  3. Verify the system has the correct kernel headers and development packages installed.

    Below is the command to check the current kernel version the system is running.

    ~# uname -r

    The development packages and kernel headers are needed for the CUDA driver. It should match the running version of the kernel at the time of the driver installation, also whenever the driver is rebuilt. (Reference: CUDA Installation Guide Linux - 2.4 Verify System has Correct Kernel Headers and Development Packages Installed)

    Note: If system update is performed and changes made to the linux kernel version being used, make sure to re-run the commands below to ensure you have the kernel headers installed. Otherwise, the CUDA Driver will fail to work with the new kernel. (Reference: CUDA Installation Guide Linux - 2.4 Verify System has Correct Kernel Headers and Development Packages Installed).

    To install the development packages and kernel headers for the currently running kernel:

    ~# apt-get install build-essential
    ~# apt-get install linux-headers-$(uname -r)
  4. Verify the system has gcc installed.

    ~# gcc --version

Installation of Nvidia driver and CUDA from the official repo

  1. Install official repository.

    Browse the repos depending on your operating system.

    Note: At the time of writing, the latest version is 10.0.130-1, but you may user other versions. The commands bellow will assume version 10.0.130-1 and Ubuntu 16.04.

    Download the cudo repo package:

    ~# wget

    Fetch the gpg keys for the repo:

    ~# apt-key adv --fetch-keys

    Then install:

    ~# dpkg -i cuda-repo-ubuntu1604_10.0.130-1_amd64.deb

    To check that the installation was successful, navigate to the /etc/source.list.d and check if there is repo list for CUDA.

    ~# ls -l /etc/apt/sources.list.d
    total 4
    -rw-r--r-- 1 root root 80 Nov 13 22:48 cuda.list
    ~# cat /etc/apt/sources.list.d/cuda.list

    Update apt cache,

    ~# apt-get update
  2. Install the nvidia driver and the CUDA toolkit.

    apt-get install --no-install-recommends cuda
  3. Reboot machine.

  4. Once rebooted, double check if the nvidia driver has been loaded automatically.

    ~# lsmod | grep -i nvidia
    nvidia_uvm            790528  0
    nvidia_drm             45056  0
    nvidia_modeset       1040384  1 nvidia_drm
    nvidia              16592896  2 nvidia_modeset,nvidia_uvm
    ipmi_msghandler        49152  3 ipmi_ssif,nvidia,ipmi_si
    drm_kms_helper        155648  1 nvidia_drm
    drm                   364544  3 drm_kms_helper,nvidia_drm
  5. Check nvidia gpu and driver version.

    ~# nvidia-debugdump --list
    Found 1 NVIDIA devices
            Device ID:              0
            Device name:            Tesla P4
            GPU internal ID:        0421818006236
    ~# cat /proc/driver/nvidia/version
    NVRM version: NVIDIA UNIX x86_64 Kernel Module  410.79  Thu Nov 15 10:41:04 CST 2018
    GCC version:  gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.11)
    ~# nvidia-smi
    | NVIDIA-SMI 410.79       Driver Version: 410.79       CUDA Version: 10.0     |
    | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
    | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
    |   0  Tesla P4            Off  | 00000000:D8:00.0 Off |                    0 |
    | N/A   37C    P0    23W /  75W |      0MiB /  7611MiB |      0%      Default |
    | Processes:                                                       GPU Memory |
    |  GPU       PID   Type   Process name                             Usage      |
    |  No running processes found                                                 |
  6. Check CUDA version.

    ~# nvcc -V
    nvcc: NVIDIA (R) Cuda compiler driverCopyright (c) 2005-2018 NVIDIA CorporationBuilt on Sat_Aug_25_21:08:01_CDT_2018Cuda compilation tools, release 10.0, V10.0.130

    Note: For Ubuntu 18.04 image for x2 system, the nouveau driver is integrated and loaded on boot automatically. Because of this it interferes with the loading of the official Nvidia driver (installed from above instruction).

    To unload nouveau, run the following command:

    ~# rmmod nouveau

    However, this will not prevent nouveau from loading on-boot.

Was it helpful?