Error as training on my own dataset, did anyone have this problem before? #22

twilight0718 · 2022-05-31T06:38:06Z

[W python_anomaly_mode.cpp:104] Warning: Error detected in CudnnBatchNormBackward. Traceback of forward call that caused the error:
File "run.py", line 144, in
train(config, generator, discriminator, kp_detector, opt.checkpoint, log_dir, dataset, opt.local_rank,device,opt,writer)
File "/mnt/users/CVPR2022-DaGAN-master/train.py", line 66, in train
losses_generator, generated = generator_full(x)

Meanwhile there's another problem as well:
Traceback (most recent call last):
File "run.py", line 144, in
train(config, generator, discriminator, kp_detector, opt.checkpoint, log_dir, dataset, opt.local_rank,device,opt,writer)
File "/mnt/users/CVPR2022-DaGAN-master/train.py", line 74, in train
loss.backward()
File "/home/anaconda3/envs/DaGAN/lib/python3.7/site-packages/torch/tensor.py", line 221, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [32]] is at version 5; expected version 4 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!

It seems an inplace problem happen, but I couldn't find anywhere with an inplace code.

harlanhong · 2022-05-31T08:35:23Z

Hi, Please use multiple GPUs to train the network. It would happen this problem for some unknow reasons if you train with only one GPU . I cannot solve this problem, maybe it caused by the version of PyTorch.

twilight0718 · 2022-05-31T10:07:41Z

Thx a lot! The problem have been solved!
I truely recommand to note this problem in Readme file.

twilight0718 closed this as completed May 31, 2022

sangeun99 mentioned this issue Jun 7, 2023

Error while finetuning the model #75

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error as training on my own dataset, did anyone have this problem before? #22

Error as training on my own dataset, did anyone have this problem before? #22

twilight0718 commented May 31, 2022 •

edited

Loading

harlanhong commented May 31, 2022

twilight0718 commented May 31, 2022

Error as training on my own dataset, did anyone have this problem before? #22

Error as training on my own dataset, did anyone have this problem before? #22

Comments

twilight0718 commented May 31, 2022 • edited Loading

harlanhong commented May 31, 2022

twilight0718 commented May 31, 2022

twilight0718 commented May 31, 2022 •

edited

Loading