Nvidia Multi Instance GPU (MIG) configuration on OpenShift

Red Hat Demo Platform Catalog (RHDP) option:

MLOps Demo: Data Science & Edge Practice

MIG demo is currently a WIP for RHDP and likely will NOT work

Error message

error launching instance: You have requested more vCPU capacity than your
current vCPU limit of 64 allows for the instance bucket that the specified
instance type belongs to.

AWS Notes

AWS type p4d.24xlarge is currently only in availability zone us-east-2b and has 96 vCPU.

If your cluster does not have a machine set in us-east-2b you will probably not be able to request this GPU type.

Prerequisites

Nvidia GPU hardware
- A100
- H100
- A30

Quickstart

Setup MIG single mode.

Type: p4d.24xlarge = 8 x GPUs
Profile: 1 GPU and 5GB of memory
Resource: nvidia.com/gpu: 1

. scripts/bootstrap.sh

nvidia_setup_mig_config single all-1g.5gb

Nvidia MIG profiles

Setup MIG profile

. scripts/bootstrap.sh

# setup MIG single
# ex: nvidia.com/gpu: 1
nvidia_setup_mig_config single all-1g.5gb
nvidia_setup_mig_config single all-2g.10gb

# setup MIG mixed
# ex: nvidia.com/mig-2g.10gb: 1
nvidia_setup_mig_config mixed all-balanced

Manually Pick MIG profile

# mode = single / mixed
MIG_CONFIG=all-1g.5gb
MIG_CONFIG=all-2g.10gb

# mode = mixed 
MIG_CONFIG=all-balanced

Manually apply MIG partitioning profile(s) - Mixed

# add profile label
oc label node --overwrite \
  -l "node-role.kubernetes.io/gpu" \
  "nvidia.com/mig.config=${MIG_CONFIG}"

# remove profile label
oc label node --overwrite \
  -l "node-role.kubernetes.io/gpu" \
  "nvidia.com/mig.config-"

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MIG.md

MIG.md

Nvidia Multi Instance GPU (MIG) configuration on OpenShift

AWS Notes

Prerequisites

Quickstart

Nvidia MIG profiles

Files

MIG.md

Latest commit

History

MIG.md

File metadata and controls

Nvidia Multi Instance GPU (MIG) configuration on OpenShift

AWS Notes

Prerequisites

Quickstart

Nvidia MIG profiles