Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error as training on my own dataset, did anyone have this problem before? #22

Closed
twilight0718 opened this issue May 31, 2022 · 2 comments

Comments

@twilight0718
Copy link

twilight0718 commented May 31, 2022

[W python_anomaly_mode.cpp:104] Warning: Error detected in CudnnBatchNormBackward. Traceback of forward call that caused the error:
File "run.py", line 144, in
train(config, generator, discriminator, kp_detector, opt.checkpoint, log_dir, dataset, opt.local_rank,device,opt,writer)
File "/mnt/users/CVPR2022-DaGAN-master/train.py", line 66, in train
losses_generator, generated = generator_full(x)

Meanwhile there's another problem as well:
Traceback (most recent call last):
File "run.py", line 144, in
train(config, generator, discriminator, kp_detector, opt.checkpoint, log_dir, dataset, opt.local_rank,device,opt,writer)
File "/mnt/users/CVPR2022-DaGAN-master/train.py", line 74, in train
loss.backward()
File "/home/anaconda3/envs/DaGAN/lib/python3.7/site-packages/torch/tensor.py", line 221, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [32]] is at version 5; expected version 4 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!

It seems an inplace problem happen, but I couldn't find anywhere with an inplace code.

@harlanhong
Copy link
Owner

Hi, Please use multiple GPUs to train the network. It would happen this problem for some unknow reasons if you train with only one GPU . I cannot solve this problem, maybe it caused by the version of PyTorch.

@twilight0718
Copy link
Author

Thx a lot! The problem have been solved!
I truely recommand to note this problem in Readme file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants