-
Notifications
You must be signed in to change notification settings - Fork 6.5k
Description
Hi CycleGAN developers,
I'm encountering the error below while training example.
The training goes for a bit, and it crashes.
Any suggestion I can try? I appreciate your help in advance!
python train.py --dataroot ./datasets/maps --name maps_cyclegan --model cycle_gan
(epoch: 1, iters: 400, time: 0.772, data: 0.003) D_A: 0.268 G_A: 0.446 cycle_A: 1.283 idt_A: 0.335 D_B: 0.224 G_B: 0.270 cycle_B: 0.656 idt_B: 0.577
(epoch: 1, iters: 500, time: 0.303, data: 0.002) D_A: 0.233 G_A: 0.391 cycle_A: 2.048 idt_A: 0.318 D_B: 0.238 G_B: 0.284 cycle_B: 0.659 idt_B: 1.037
Traceback (most recent call last):
File "train.py", line 52, in
model.optimize_parameters() # calculate loss functions, get gradients, update network weights
File "/data/yjwa/torchwork/pytorch-CycleGAN-and-pix2pix/models/cycle_gan_model.py", line 187, in optimize_parameters
self.backward_G() # calculate gradients for G_A and G_B
File "/data/yjwa/torchwork/pytorch-CycleGAN-and-pix2pix/models/cycle_gan_model.py", line 178, in backward_G
self.loss_G.backward()
File "/data/yjwa/anaconda3/lib/python3.7/site-packages/torch/tensor.py", line 198, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "/data/yjwa/anaconda3/lib/python3.7/site-packages/torch/autograd/init.py", line 100, in backward
allow_unreachable=True) # allow_unreachable flag
RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling cublasSgemv(handle, op, m, n, &alpha, a, lda, x, incx, &beta, y, incy)
(gemv at /opt/conda/conda-bld/pytorch_1587428398394/work/aten/src/ATen/cuda/CUDABlas.cpp:318)
frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x4e (0x7fad9ef93b5e in /data/yjwa/anaconda3/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #1: + 0xdb9fa7 (0x7fad9ff79fa7 in /data/yjwa/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so)
frame #2: at::native::(anonymous namespace)::slow_conv_transpose2d_acc_grad_parameters_cuda_template(at::Tensor const&, at::Tensor const&, at::Tensor&, at::Tensor&, at::Tensor const&, at::Tensor const&, c10::ArrayRef, c10::ArrayRef, c10::ArrayRef, c10::ArrayRef, c10::ArrayRef, int) + 0xea6 (0x7fada186ce86 in /data/yjwa/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so)
frame #3: at::native::slow_conv_transpose2d_backward_cuda(at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::ArrayRef, c10::ArrayRef, c10::ArrayRef, c10::ArrayRef, c10::ArrayRef, at::Tensor const&, at::Tensor const&, std::array<bool, 3ul>) + 0x323 (0x7fada1871c93 in /data/yjwa/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so)
frame #4: + 0xe1f64d (0x7fad9ffdf64d in /data/yjwa/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so)
frame #5: + 0xe28007 (0x7fad9ffe8007 in /data/yjwa/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so)
frame #6: + 0x29e286e (0x7fadc8b5786e in /data/yjwa/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #7: + 0xe23c87 (0x7fadc6f98c87 in /data/yjwa/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #8: torch::autograd::generated::SlowConvTranspose2DBackward::apply(std::vector<at::Tensor, std::allocatorat::Tensor >&&) + 0x516 (0x7fadc87a0c46 in /data/yjwa/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #9: + 0x2ae8215 (0x7fadc8c5d215 in /data/yjwa/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #10: torch::autograd::Engine::evaluate_function(std::shared_ptrtorch::autograd::GraphTask&, torch::autograd::Node*, torch::autograd::InputBuffer&) + 0x16f3 (0x7fadc8c5a513 in /data/yjwa/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #11: torch::autograd::Engine::thread_main(std::shared_ptrtorch::autograd::GraphTask const&, bool) + 0x3d2 (0x7fadc8c5b2f2 in /data/yjwa/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #12: torch::autograd::Engine::thread_init(int) + 0x39 (0x7fadc8c53969 in /data/yjwa/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #13: torch::autograd::python::PythonEngine::thread_init(int) + 0x38 (0x7fadcbf9a558 in /data/yjwa/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #14: + 0xc819d (0x7fadcea1119d in /data/yjwa/anaconda3/lib/python3.7/site-packages/torch/lib/../../../.././libstdc++.so.6)
frame #15: + 0x7e65 (0x7faded2f3e65 in /lib64/libpthread.so.0)
frame #16: clone + 0x6d (0x7faded01c88d in /lib64/libc.so.6)`