You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Jun 14, 2023. It is now read-only.
Hi
I am trying to use torchrun --nproc_per_node=8 to train SimCLR on ImageNet using 8 GPUs in parallel. I am using this command to distribute the model
model = torch.nn.parallel.DistributedDataParallel(model, device_ids=[args.gpu], bucket_cap_mb=200)
The problem is that every couple of batches the data loading is very slow. This is not only for the first batch, which I guess is normal.
Hi
I am trying to use torchrun --nproc_per_node=8 to train SimCLR on ImageNet using 8 GPUs in parallel. I am using this command to distribute the model
model = torch.nn.parallel.DistributedDataParallel(model, device_ids=[args.gpu], bucket_cap_mb=200)
The problem is that every couple of batches the data loading is very slow. This is not only for the first batch, which I guess is normal.
Here is the (data_time_, batch_time_) when j=8
I also noticed that although GPUs are 100% utilized but their power usage are around 80/350 W.
The text was updated successfully, but these errors were encountered: