You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi! I found that GPU memory consumption is highly unbalanced between GPU0 and the rest of GPUs. Here's the command I used to train on imagenet with resolution 128.
As you can see, the GPU0 only consumes much less memory than rest of the GPUs. May I ask what caused such imbalance and what's the normal memory consumption is when training at 128 resolution with the settings above?
The text was updated successfully, but these errors were encountered:
However, when I set batch-gpu=8, gpus=8, batch=64, the GPU memory consumption reduced. It's so weird, I'm wondering if someone might know any clue about this?
Hi! I found that GPU memory consumption is highly unbalanced between GPU0 and the rest of GPUs. Here's the command I used to train on imagenet with resolution 128.
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
python train.py
--outdir=/storage/guangrun/qijia_3d_model/stylegan-xl/finetune128/
--cfg=stylegan3-t
--data=/datasets/guangrun/qijia_3d_model/imagenet/stylegan_xl/imagenet_sub_seg128.zip
--gpus=8
--batch=32
--mirror=1
--snap 10
--batch-gpu 4
--kimg 10000
--cond True
--superres
--up_factor 2
--head_layers 7
--path_stem /scratch/local/ssd/guangrun/qijia_3d_model/stylegan_xl/imagenet64.pkl
--resume /scratch/local/ssd/guangrun/qijia_3d_model/stylegan_xl/imagenet128.pkl
As you can see, the GPU0 only consumes much less memory than rest of the GPUs. May I ask what caused such imbalance and what's the normal memory consumption is when training at 128 resolution with the settings above?
The text was updated successfully, but these errors were encountered: