-
Either sudo bash mig_easy_setup.sh to immediately create the MIG partitions, or use mig_flags.sh with the --create flag if the node has already been reset recently, and the old MIG partitions deleted.
-
Confirm that MIG partitioning was "successful"
-
Run the setup_mig_cpu_affinity.sh to create cgroups which interface with the MIG partitions in the same 1/7 resource ratio.
-
Verify that the CPU/GPU partition is working with the following scripts:
- Run the mig_easy_setup.sh bash file to create seven equal partitions of the GPU on a node. Requires sudo access. To test that sudo works, you can run sudo -v
- Run the setup_mig_cpu_affinity.sh bash file with sudo to create 7 partitions of the CPU, which are then connected with the 7 MIG partitions from earlier. MIG must be set up already for this file to run.
- Basic usage of mig_launcher.sh:
./mig_launcher.sh <mig_instance_number> <your_command>Examples:
./mig_launcher.sh 0 python train.py # Run on MIG 0
./mig_launcher.sh 1 python inference.py # Run on MIG 1
./mig_launcher.sh -d 2 python job.py # Run on MIG 2 in background
./mig_launcher.sh -v 3 python debug.py # Run on MIG 3 with verbose outputstat -fc %T /sys/fs/cgroup/
If the output is cgroup2fs, it means cgroup v2 is in use; if it is tmpfs, it means cgroup v1 is in use
- Do this after running the setup_mig_cpu_affinity.sh file.
for i in {0..6}; do
echo "MIG $i CPUs: $(cat /sys/fs/cgroup/mig/mig$i/cpuset.cpus)"
done- If there is a python program running on htop (click 't' after htop to see CPU tree), and it needs to be killed, do the kill $PID or sudo killall python3.
Can also use this as a sanity check
cd /sys/fs/cgroup/mig/mig1
echo -n "cpus: " && cat cpuset.cpus
echo -n "cpus.effective: " && cat cpuset.cpus.effective
echo -n "mems: " && cat cpuset.mems
echo -n "mems.effective: " && cat cpuset.mems.effective