Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: a leaf Variable that requires grad is being used in an in-place operation. #3

Open
jxzhangjhu opened this issue Nov 8, 2022 · 1 comment

Comments

@jxzhangjhu
Copy link

I have the following error, can you help to take a look?

(/home/ec2-user/SageMaker/env/test) sh-4.2$ python synthetic_train.py --num_res_blocks 3 --diffusion_steps 4000 --noise_schedule linear --lr 1e-4 --batch_size 20000 --task 1

Logging to /tmp/openai-2022-11-08-05-25-53-353605
args: Namespace(task=1, schedule_sampler='uniform', lr=0.0001, weight_decay=0.0, lr_anneal_steps=1000, batch_size=20000, microbatch=-1, ema_rate='0.9999', log_interval=10, save_interval=10000, resume_checkpoint='', use_fp16=False, fp16_scale_growth=0.001, num_channels=256, num_res_blocks=3, dropout=0.2, use_checkpoint=False, in_channels=2, learn_sigma=False, diffusion_steps=4000, noise_schedule='linear', timestep_respacing='', use_kl=False, predict_xstart=False, rescale_timesteps=False, rescale_learned_sigmas=False)
[W ProcessGroupGloo.cpp:694] Warning: Unable to resolve hostname to a (local) address. Using the loopback address as fallback. Manually set the network interface to bind to with GLOO_SOCKET_IFNAME. (function operator())
Logging to /tmp/openai-2022-11-08-05-25-53-357098
creating 2d model and diffusion...
creating 2d data loader...
training 2d model...
Traceback (most recent call last):
File "/home/ec2-user/SageMaker/DDIB/ddib/synthetic_train.py", line 82, in
main()
File "/home/ec2-user/SageMaker/DDIB/ddib/synthetic_train.py", line 40, in main
TrainLoop(
File "/home/ec2-user/SageMaker/DDIB/ddib/guided_diffusion/train_util.py", line 67, in init
self._load_and_sync_parameters()
File "/home/ec2-user/SageMaker/DDIB/ddib/guided_diffusion/train_util.py", line 122, in _load_and_sync_parameters
dist_util.sync_params(self.model.parameters())
File "/home/ec2-user/SageMaker/DDIB/ddib/guided_diffusion/dist_util.py", line 83, in sync_params
dist.broadcast(p, 0)
File "/home/ec2-user/SageMaker/env/test/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py", line 1408, in broadcast
work.wait()
RuntimeError: a leaf Variable that requires grad is being used in an in-place operation.

@jxzhangjhu
Copy link
Author

I found the solution to fix this one just change to p.detach()

openai/improved-diffusion@ff2c2a1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant