Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Next »

👁️ Overview

This article will show you the steps to take to manually enable and manage a GPU for MIG.

\uD83D\uDCD8 Instructions

  1. Upgrade and Update

    apt upgrade
    apt update
  2. Get a nvidia driver version that is supported for MIG
    A100 and A30 GPUs are supported starting with CUDA 11/R450 drivers.

    apt install nvidia-driver-525-server

525 is the tested version that is working.
apt search nvidia-driver can be used to search for later versions

  1. Enable MIG on the GPU

    sudo nvidia-smi -i 0 -mig 1
  2. Reset the gpu to complete the enabling of mig

    sudo nvidia-smi --gpu-reset
  3. You will now have to decide what size MIGs you want to use.
    nvidia-smi mig -lgip will show you the available MIG profiles and their corresponding IDs for the create step

Warning: The example below is for demonstration/illustrative purposes, make sure to decide for yourself the way you want to deploy MIG


Example:

I want two GPU instances with 10GB memory each.

nvidia-smi mig -cgi 15,15 -C

This creates 2 gpu instances of ID 15, the -C flag creates a default compute instance. (Recommended to always run -C unless you know what you need specifically)

nvidia-smi mig -lgi this will now show you the GPU instances that you’ve created.

nvidia-smi -L this will give you the MIG instance UUIDs for binding

You now have two gpu instances in which you can run CUDA applications.

To do this you can run the program with the following command:

CUDA_VISIBLE_DEVICES=MIG-d0c17ef4-4e20-5117-a3b8-5ff7e8424826 ./Application && CUDA_VISIBLE_DEVICES=MIG-221491dd-4f47-55cc-b5d6-c771c5015bd2 ./Application

It is still possible to run applications on the GPU UUID as well as the MIG UUIDs.


🔖 References

All information is taken from the NVIDIA MIG Documentation

📚  Related articles

MIG-Kubernetes - CloudKB - Confluence (atlassian.net)
Ansible for MIG - CloudKB - Confluence (atlassian.net)

Reviewer

Review period

  • Reviewed by

  • No labels