Manual MIG

Overview

This article will show you the steps to take to manually enable and manage a GPU for MIG.

 Instructions

  1. Upgrade and Update

    apt upgrade apt update
  2. Get a nvidia driver version that is supported for MIG
    A100 and A30 GPUs are supported starting with CUDA 11/R450 drivers.

    apt install nvidia-driver-525-server

525 is the tested version that is working.
apt search nvidia-driver can be used to search for later versions

  1. Enable MIG on the GPU

    sudo nvidia-smi -i 0 -mig 1
  2. Reset the gpu to complete the enabling of mig

  3. You will now have to decide what size MIGs you want to use.
    nvidia-smi mig -lgip will show you the available MIG profiles and their corresponding IDs for the create step

Warning: The example below is for demonstration/illustrative purposes, make sure to decide for yourself the way you want to deploy MIG



Example:

I want two GPU instances with 10GB memory each.

nvidia-smi mig -cgi 15,15 -C

This creates 2 gpu instances of ID 15, the -C flag creates a default compute instance. (Recommended to always run -C unless you know what you need specifically)

The ID used in the creation is a Profile ID, to perform actions on your created segments, you will need to reference it’s Instance ID

nvidia-smi mig -lgi this will now show you the GPU instances that you’ve created.

nvidia-smi -L this will give you the MIG instance UUIDs for binding

You now have two gpu instances in which you can run CUDA applications.

To do this you can run the program with the following command:

CUDA_VISIBLE_DEVICES=MIG-d0c17ef4-4e20-5117-a3b8-5ff7e8424826 ./Application && CUDA_VISIBLE_DEVICES=MIG-221491dd-4f47-55cc-b5d6-c771c5015bd2 ./Application

It is still possible to run applications on the GPU UUID as well as the MIG UUIDs.


Deleting MIG Instances:

Some other helpful commands to do various operations:

You can reset your MIG configuration by resetting the GPU

sudo nvidia-smi --gpu-reset

Alternatively you can delete your compute instances

nvidia-smi mig -dci

Then your GPU instances

nvidia-smi mig -dgi

For more specific control, refer to these docs: NVIDIA Multi-Instance GPU User Guide :: NVIDIA Data Center GPU Driver Documentation

References

All information is taken from the NVIDIA MIG Documentation

 Related articles

MIG-Kubernetes - CloudKB - Confluence (atlassian.net)
Ansible for MIG - CloudKB - Confluence (atlassian.net)

Reviewer

Review period

Reviewer

Review period

Reviewed by