Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 2 Current »

👁️ Overview

This article will show you the steps to take to manually enable and manage a GPU for MIG.

\uD83D\uDCD8 Instructions

  1. Upgrade and Update

    apt upgrade
    apt update
  2. Get a nvidia driver version that is supported for MIG
    A100 and A30 GPUs are supported starting with CUDA 11/R450 drivers.

    apt install nvidia-driver-525-server

525 is the tested version that is working.
apt search nvidia-driver can be used to search for later versions

  1. Enable MIG on the GPU

    sudo nvidia-smi -i 0 -mig 1
  2. Reset the gpu to complete the enabling of mig

    sudo nvidia-smi --gpu-reset
  3. You will now have to decide what size MIGs you want to use.
    nvidia-smi mig -lgip will show you the available MIG profiles and their corresponding IDs for the create step

Warning: The example below is for demonstration/illustrative purposes, make sure to decide for yourself the way you want to deploy MIG



Example:

I want two GPU instances with 10GB memory each.

nvidia-smi mig -cgi 15,15 -C

This creates 2 gpu instances of ID 15, the -C flag creates a default compute instance. (Recommended to always run -C unless you know what you need specifically)

nvidia-smi mig -lgi this will now show you the GPU instances that you’ve created.

nvidia-smi -L this will give you the MIG instance UUIDs for binding

You now have two gpu instances in which you can run CUDA applications.

To do this you can run the program with the following command:

CUDA_VISIBLE_DEVICES=MIG-d0c17ef4-4e20-5117-a3b8-5ff7e8424826 ./Application && CUDA_VISIBLE_DEVICES=MIG-221491dd-4f47-55cc-b5d6-c771c5015bd2 ./Application

It is still possible to run applications on the GPU UUID as well as the MIG UUIDs.


Deleting MIG Instances:

Some other helpful commands to do various operations:

You can reset your MIG configuration by resetting the GPU

sudo nvidia-smi --gpu-reset

You will have to enable MIG again if you reset the GPU. (step 3 above)

Alternatively you can delete your compute instances

nvidia-smi mig -dci

Then your GPU instances

nvidia-smi mig -dgi

These two commands will delete all the instances

For more specific control, refer to these docs: NVIDIA Multi-Instance GPU User Guide :: NVIDIA Data Center GPU Driver Documentation

🔖 References

All information is taken from the NVIDIA MIG Documentation

📚  Related articles

MIG-Kubernetes - CloudKB - Confluence (atlassian.net)
Ansible for MIG - CloudKB - Confluence (atlassian.net)

Reviewer

Review period

  • Reviewed by

  • No labels