You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Logging to /tmp/openai-2022-11-08-05-25-53-353605
args: Namespace(task=1, schedule_sampler='uniform', lr=0.0001, weight_decay=0.0, lr_anneal_steps=1000, batch_size=20000, microbatch=-1, ema_rate='0.9999', log_interval=10, save_interval=10000, resume_checkpoint='', use_fp16=False, fp16_scale_growth=0.001, num_channels=256, num_res_blocks=3, dropout=0.2, use_checkpoint=False, in_channels=2, learn_sigma=False, diffusion_steps=4000, noise_schedule='linear', timestep_respacing='', use_kl=False, predict_xstart=False, rescale_timesteps=False, rescale_learned_sigmas=False)
[W ProcessGroupGloo.cpp:694] Warning: Unable to resolve hostname to a (local) address. Using the loopback address as fallback. Manually set the network interface to bind to with GLOO_SOCKET_IFNAME. (function operator())
Logging to /tmp/openai-2022-11-08-05-25-53-357098
creating 2d model and diffusion...
creating 2d data loader...
training 2d model...
Traceback (most recent call last):
File "/home/ec2-user/SageMaker/DDIB/ddib/synthetic_train.py", line 82, in
main()
File "/home/ec2-user/SageMaker/DDIB/ddib/synthetic_train.py", line 40, in main
TrainLoop(
File "/home/ec2-user/SageMaker/DDIB/ddib/guided_diffusion/train_util.py", line 67, in init
self._load_and_sync_parameters()
File "/home/ec2-user/SageMaker/DDIB/ddib/guided_diffusion/train_util.py", line 122, in _load_and_sync_parameters
dist_util.sync_params(self.model.parameters())
File "/home/ec2-user/SageMaker/DDIB/ddib/guided_diffusion/dist_util.py", line 83, in sync_params
dist.broadcast(p, 0)
File "/home/ec2-user/SageMaker/env/test/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py", line 1408, in broadcast
work.wait()
RuntimeError: a leaf Variable that requires grad is being used in an in-place operation.
The text was updated successfully, but these errors were encountered:
I have the following error, can you help to take a look?
(/home/ec2-user/SageMaker/env/test) sh-4.2$ python synthetic_train.py --num_res_blocks 3 --diffusion_steps 4000 --noise_schedule linear --lr 1e-4 --batch_size 20000 --task 1
Logging to /tmp/openai-2022-11-08-05-25-53-353605
args: Namespace(task=1, schedule_sampler='uniform', lr=0.0001, weight_decay=0.0, lr_anneal_steps=1000, batch_size=20000, microbatch=-1, ema_rate='0.9999', log_interval=10, save_interval=10000, resume_checkpoint='', use_fp16=False, fp16_scale_growth=0.001, num_channels=256, num_res_blocks=3, dropout=0.2, use_checkpoint=False, in_channels=2, learn_sigma=False, diffusion_steps=4000, noise_schedule='linear', timestep_respacing='', use_kl=False, predict_xstart=False, rescale_timesteps=False, rescale_learned_sigmas=False)
[W ProcessGroupGloo.cpp:694] Warning: Unable to resolve hostname to a (local) address. Using the loopback address as fallback. Manually set the network interface to bind to with GLOO_SOCKET_IFNAME. (function operator())
Logging to /tmp/openai-2022-11-08-05-25-53-357098
creating 2d model and diffusion...
creating 2d data loader...
training 2d model...
Traceback (most recent call last):
File "/home/ec2-user/SageMaker/DDIB/ddib/synthetic_train.py", line 82, in
main()
File "/home/ec2-user/SageMaker/DDIB/ddib/synthetic_train.py", line 40, in main
TrainLoop(
File "/home/ec2-user/SageMaker/DDIB/ddib/guided_diffusion/train_util.py", line 67, in init
self._load_and_sync_parameters()
File "/home/ec2-user/SageMaker/DDIB/ddib/guided_diffusion/train_util.py", line 122, in _load_and_sync_parameters
dist_util.sync_params(self.model.parameters())
File "/home/ec2-user/SageMaker/DDIB/ddib/guided_diffusion/dist_util.py", line 83, in sync_params
dist.broadcast(p, 0)
File "/home/ec2-user/SageMaker/env/test/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py", line 1408, in broadcast
work.wait()
RuntimeError: a leaf Variable that requires grad is being used in an in-place operation.
The text was updated successfully, but these errors were encountered: