Description
Use your AWS Credits from the ARRC program
GPU provisioning on AWS - specifically 48G NVidia L40S, 80G H100, 141G H200 - and compare to 48G RTX-A6000
https://docs.aws.amazon.com/dlami/latest/devguide/gpu.html
P5 - 8 x H100
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/p5-instances-started.html
using the aws DL ubuntu image
https://aws.amazon.com/releasenotes/aws-deep-learning-base-gpu-ami-ubuntu-20-04/
https://shop.lambdalabs.com/deep-learning/servers/blade/customize?_gl=1*aktggw*_ga*MTQzODY0NTQ2OC4xNzEwMDI3MTcz*_ga_43EZT1FM6Q*MTcxMDAyNzE3Mi4xLjAuMTcxMDAyNzE3Mi42MC4wLjA.
Instance launch failed
We currently do not have sufficient p5.48xlarge capacity in zones with support for 'gp3' volumes. Our system will be working on provisioning additional capacity.
P5e - 8 x H200 with 141G
G6 - L4 or G6e L40s
Quota increases
evaluating L40S, H100,H200 for model sizes from 40-80G
I am migrating from an on prem RTX-A6000 with 48G vram and 2 x RTX-A4500 with 40G vram
https://us-east-1.console.aws.amazon.com/servicequotas/home/services/ec2/quotas/L-417A185B
0 to 96 for Request quota increase: Running On-Demand P instances
0 to 96 for Request quota increase: Running On-Demand DL instances
Tested on another account where I do have 692 for DL and 96 for P instances
same result - no capacity
We currently do not have sufficient p4d.24xlarge capacity in zones with support for 'gp3' volumes. Our system will be working on provisioning additional capacity.