Ansible for MIG
Overview
This doc is a quick how-to, in order to use ansible to configure A100 GPUs for MIG usage.
Refer to MIG-Kubernetes - CloudKB - Confluence (atlassian.net) for more information.
Instructions
Spin up an instance with an A100
Note this is only for A100 GPUs
Do the following steps from an existing machine with Ansible / Public keys:
git clone https://github.com/NVIDIA/deepops.git cd deepops # Configure env, this will prompt for sudo - feel free to inspect script ./scripts/setup.sh
Update the following in the
config/inventory
file to add a single GPU host:[all] gpu01 ansible_host=172.16.x.y # Bottom of the file [all:vars] ansible_user=ubuntu # or rocky
Select a profile in
config/group_vars/all.yml
based on the available profiles inconfig/nvidia-mig-config.yml
:mig_manager_profile: all-1g.10gb # To deploy 7x10gb slots
(If required, per host profiles can be set in the
config/inventory
instead:)[all] gpu01 ansible_host=172.16.x.y mig_manager_profile=all-1g.10gb gpu02 ansible_host=172.16.x.y mig_manager_profile=all-1g.20gb
Deploy the GPU Drivers and MIG with the selected profile:
source /opt/deepops/env/bin/activate ansible-playbook playbooks/nvidia-software/nvidia-cuda.yml ansible-playbook playbooks/nvidia-software/nvidia-mig.yml --become
The second command in step 5 will likely take a while
Related articles
MIG-Kubernetes - CloudKB - Confluence (atlassian.net)
Reviewer | Review period |
---|---|
Reviewed by @Ramzi Jalili May 16, 2024 | 6 Months |
|
|