Manual MIG
Overview
This article will show you the steps to take to manually enable and manage a GPU for MIG.
Instructions
Upgrade and Update
apt upgrade apt update
Get a nvidia driver version that is supported for MIG
A100 and A30 GPUs are supported starting with CUDA 11/R450 drivers.apt install nvidia-driver-525-server
525 is the tested version that is working. apt search nvidia-driver
can be used to search for later versions
Enable MIG on the GPU
sudo nvidia-smi -i 0 -mig 1
Reset the gpu to complete the enabling of mig
sudo nvidia-smi --gpu-reset
You will now have to decide what size MIGs you want to use.
nvidia-smi mig -lgip
will show you the available MIG profiles and their corresponding IDs for the create step
Warning: The example below is for demonstration/illustrative purposes, make sure to decide for yourself the way you want to deploy MIG
Example:
I want two GPU instances with 10GB memory each.
nvidia-smi mig -cgi 15,15 -C
This creates 2 gpu instances of ID 15, the -C flag creates a default compute instance. (Recommended to always run -C unless you know what you need specifically)
The ID used in the creation is a Profile ID, to perform actions on your created segments, you will need to reference it’s Instance ID
nvidia-smi mig -lgi
this will now show you the GPU instances that you’ve created.
nvidia-smi -L
this will give you the MIG instance UUIDs for binding
You now have two gpu instances in which you can run CUDA applications.
To do this you can run the program with the following command:
CUDA_VISIBLE_DEVICES=MIG-d0c17ef4-4e20-5117-a3b8-5ff7e8424826 ./Application && CUDA_VISIBLE_DEVICES=MIG-221491dd-4f47-55cc-b5d6-c771c5015bd2 ./Application
It is still possible to run applications on the GPU UUID as well as the MIG UUIDs.
Deleting MIG Instances:
Some other helpful commands to do various operations:
You can reset your MIG configuration by resetting the GPU
sudo nvidia-smi --gpu-reset
You will have to enable MIG again if you reset the GPU. (step 3 above)
Alternatively you can delete your compute instances
nvidia-smi mig -dci
Then your GPU instances
nvidia-smi mig -dgi
These two commands will delete all the instances
For more specific control, refer to these docs: NVIDIA Multi-Instance GPU User Guide :: NVIDIA Data Center GPU Driver Documentation
References
All information is taken from the NVIDIA MIG Documentation
Related articles
MIG-Kubernetes - CloudKB - Confluence (atlassian.net)
Ansible for MIG - CloudKB - Confluence (atlassian.net)
Reviewer | Review period |
---|---|
Reviewed by | |
|
|