RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling cublasSgemv(handle, op, m, n, &alpha, a, lda, x, incx, &beta, y, incy)

Hi CycleGAN developers,

I'm encountering the error below while training example.
The training goes for a bit, and it crashes. 
Any suggestion I can try? I appreciate your help in advance!

`python train.py --dataroot ./datasets/maps --name maps_cyclegan --model cycle_gan`


(epoch: 1, iters: 400, time: 0.772, data: 0.003) D_A: 0.268 G_A: 0.446 cycle_A: 1.283 idt_A: 0.335 D_B: 0.224 G_B: 0.270 cycle_B: 0.656 idt_B: 0.577 
(epoch: 1, iters: 500, time: 0.303, data: 0.002) D_A: 0.233 G_A: 0.391 cycle_A: 2.048 idt_A: 0.318 D_B: 0.238 G_B: 0.284 cycle_B: 0.659 idt_B: 1.037 
Traceback (most recent call last):
  File "train.py", line 52, in <module>
    model.optimize_parameters()   # calculate loss functions, get gradients, update network weights
  File "/data/yjwa/torchwork/pytorch-CycleGAN-and-pix2pix/models/cycle_gan_model.py", line 187, in optimize_parameters
    self.backward_G()             # calculate gradients for G_A and G_B
  File "/data/yjwa/torchwork/pytorch-CycleGAN-and-pix2pix/models/cycle_gan_model.py", line 178, in backward_G
    self.loss_G.backward()
  File "/data/yjwa/anaconda3/lib/python3.7/site-packages/torch/tensor.py", line 198, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File "/data/yjwa/anaconda3/lib/python3.7/site-packages/torch/autograd/__init__.py", line 100, in backward
    allow_unreachable=True)  # allow_unreachable flag
RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling `cublasSgemv(handle, op, m, n, &alpha, a, lda, x, incx, &beta, y, incy)` (gemv<float> at /opt/conda/conda-bld/pytorch_1587428398394/work/aten/src/ATen/cuda/CUDABlas.cpp:318)
frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x4e (0x7fad9ef93b5e in /data/yjwa/anaconda3/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #1: <unknown function> + 0xdb9fa7 (0x7fad9ff79fa7 in /data/yjwa/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so)
frame #2: at::native::(anonymous namespace)::slow_conv_transpose2d_acc_grad_parameters_cuda_template(at::Tensor const&, at::Tensor const&, at::Tensor&, at::Tensor&, at::Tensor const&, at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>, c10::ArrayRef<long>, c10::ArrayRef<long>, c10::ArrayRef<long>, int) + 0xea6 (0x7fada186ce86 in /data/yjwa/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so)
frame #3: at::native::slow_conv_transpose2d_backward_cuda(at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>, c10::ArrayRef<long>, c10::ArrayRef<long>, c10::ArrayRef<long>, at::Tensor const&, at::Tensor const&, std::array<bool, 3ul>) + 0x323 (0x7fada1871c93 in /data/yjwa/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so)
frame #4: <unknown function> + 0xe1f64d (0x7fad9ffdf64d in /data/yjwa/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so)
frame #5: <unknown function> + 0xe28007 (0x7fad9ffe8007 in /data/yjwa/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so)
frame #6: <unknown function> + 0x29e286e (0x7fadc8b5786e in /data/yjwa/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #7: <unknown function> + 0xe23c87 (0x7fadc6f98c87 in /data/yjwa/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #8: torch::autograd::generated::SlowConvTranspose2DBackward::apply(std::vector<at::Tensor, std::allocator<at::Tensor> >&&) + 0x516 (0x7fadc87a0c46 in /data/yjwa/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #9: <unknown function> + 0x2ae8215 (0x7fadc8c5d215 in /data/yjwa/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #10: torch::autograd::Engine::evaluate_function(std::shared_ptr<torch::autograd::GraphTask>&, torch::autograd::Node*, torch::autograd::InputBuffer&) + 0x16f3 (0x7fadc8c5a513 in /data/yjwa/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #11: torch::autograd::Engine::thread_main(std::shared_ptr<torch::autograd::GraphTask> const&, bool) + 0x3d2 (0x7fadc8c5b2f2 in /data/yjwa/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #12: torch::autograd::Engine::thread_init(int) + 0x39 (0x7fadc8c53969 in /data/yjwa/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #13: torch::autograd::python::PythonEngine::thread_init(int) + 0x38 (0x7fadcbf9a558 in /data/yjwa/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #14: <unknown function> + 0xc819d (0x7fadcea1119d in /data/yjwa/anaconda3/lib/python3.7/site-packages/torch/lib/../../../.././libstdc++.so.6)
frame #15: <unknown function> + 0x7e65 (0x7faded2f3e65 in /lib64/libpthread.so.0)
frame #16: clone + 0x6d (0x7faded01c88d in /lib64/libc.so.6)`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling cublasSgemv(handle, op, m, n, &alpha, a, lda, x, incx, &beta, y, incy) #1008

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling cublasSgemv(handle, op, m, n, &alpha, a, lda, x, incx, &beta, y, incy) #1008

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions