Unbalanced GPU memory usage #94

Michaelsqj · 2022-09-23T02:01:14Z

Hi! I found that GPU memory consumption is highly unbalanced between GPU0 and the rest of GPUs. Here's the command I used to train on imagenet with resolution 128.

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
python train.py
--outdir=/storage/guangrun/qijia_3d_model/stylegan-xl/finetune128/
--cfg=stylegan3-t
--data=/datasets/guangrun/qijia_3d_model/imagenet/stylegan_xl/imagenet_sub_seg128.zip
--gpus=8
--batch=32
--mirror=1
--snap 10
--batch-gpu 4
--kimg 10000
--cond True
--superres
--up_factor 2
--head_layers 7
--path_stem /scratch/local/ssd/guangrun/qijia_3d_model/stylegan_xl/imagenet64.pkl
--resume /scratch/local/ssd/guangrun/qijia_3d_model/stylegan_xl/imagenet128.pkl

As you can see, the GPU0 only consumes much less memory than rest of the GPUs. May I ask what caused such imbalance and what's the normal memory consumption is when training at 128 resolution with the settings above?

Michaelsqj · 2022-09-23T02:24:30Z

However, when I set batch-gpu=8, gpus=8, batch=64, the GPU memory consumption reduced. It's so weird, I'm wondering if someone might know any clue about this?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unbalanced GPU memory usage #94

Unbalanced GPU memory usage #94

Michaelsqj commented Sep 23, 2022

Michaelsqj commented Sep 23, 2022

Unbalanced GPU memory usage #94

Unbalanced GPU memory usage #94

Comments

Michaelsqj commented Sep 23, 2022

Michaelsqj commented Sep 23, 2022