Ansible for MIG

Overview

This doc is a quick how-to, in order to use ansible to configure A100 GPUs for MIG usage.

Refer to MIG-Kubernetes - CloudKB - Confluence (atlassian.net) for more information.

 Instructions

  1. Spin up an instance with an A100

Note this is only for A100 GPUs

  1. Do the following steps from an existing machine with Ansible / Public keys:

    git clone https://github.com/NVIDIA/deepops.git cd deepops # Configure env, this will prompt for sudo - feel free to inspect script ./scripts/setup.sh
  2. Update the following in the config/inventory file to add a single GPU host:

    [all] gpu01 ansible_host=172.16.x.y # Bottom of the file [all:vars] ansible_user=ubuntu # or rocky
  3. Select a profile in config/group_vars/all.yml based on the available profiles in config/nvidia-mig-config.yml:

    mig_manager_profile: all-1g.10gb # To deploy 7x10gb slots

    (If required, per host profiles can be set in the config/inventory instead:)

    [all] gpu01 ansible_host=172.16.x.y mig_manager_profile=all-1g.10gb gpu02 ansible_host=172.16.x.y mig_manager_profile=all-1g.20gb
  4. Deploy the GPU Drivers and MIG with the selected profile:

    source /opt/deepops/env/bin/activate ansible-playbook playbooks/nvidia-software/nvidia-cuda.yml ansible-playbook playbooks/nvidia-software/nvidia-mig.yml --become

The second command in step 5 will likely take a while

 Related articles

MIG-Kubernetes - CloudKB - Confluence (atlassian.net)

Reviewer

Review period

Reviewer

Review period

Reviewed by @Ramzi Jalili May 16, 2024

6 Months