Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: CUDA error: no kernel image is available for execution on the device #12

Open
ammarasmro opened this issue Feb 28, 2021 · 3 comments

Comments

@ammarasmro
Copy link

System:

  • WSL2

GPU: 3080

 python training/run_experiment.py --model_class=MLP --data_class=MNIST --max_epochs=5 --gpus=-1

Followed mentioned steps but ended up with this error

RuntimeError: CUDA error: no kernel image is available for execution on the device

Complete output

GPU available: True, used: True
TPU available: None, using: 0 TPU cores
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
/home/myuser/miniconda3/envs/fsdl-text-recognizer-2021/lib/python3.6/site-packages/torch/cuda/__init__.py:104: UserWarning:
GeForce RTX 3080 with CUDA capability sm_86 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70 sm_75.
If you want to use the GeForce RTX 3080 GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/

  warnings.warn(incompatible_device_warn.format(device_name, capability, " ".join(arch_list), device_name))

  | Name          | Type     | Params
-------------------------------------------
0 | model         | MLP      | 936 K
1 | model.dropout | Dropout  | 0
2 | model.fc1     | Linear   | 803 K
3 | model.fc2     | Linear   | 131 K
4 | model.fc3     | Linear   | 1.3 K
5 | train_acc     | Accuracy | 0
6 | val_acc       | Accuracy | 0
7 | test_acc      | Accuracy | 0
-------------------------------------------
936 K     Trainable params
0         Non-trainable params
936 K     Total params
/home/myuser/miniconda3/envs/fsdl-text-recognizer-2021/lib/python3.6/site-packages/pytorch_lightning/utilities/distributed.py:49: UserWarning: The dataloader, val dataloader 0, does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` (try 20 which is the number of cpus on this machine) in the `DataLoader` init to improve performance.
  warnings.warn(*args, **kwargs)
Validation sanity check: 0it [00:00, ?it/s]Traceback (most recent call last):
  File "training/run_experiment.py", line 90, in <module>
    main()
  File "training/run_experiment.py", line 85, in main
    trainer.fit(lit_model, datamodule=data)
  File "/home/myuser/miniconda3/envs/fsdl-text-recognizer-2021/lib/python3.6/site-packages/pytorch_lightning/trainer/trainer.py", line 473, in fit
    results = self.accelerator_backend.train()
  File "/home/myuser/miniconda3/envs/fsdl-text-recognizer-2021/lib/python3.6/site-packages/pytorch_lightning/accelerators/gpu_accelerator.py", line 66, in train
    results = self.train_or_test()
  File "/home/myuser/miniconda3/envs/fsdl-text-recognizer-2021/lib/python3.6/site-packages/pytorch_lightning/accelerators/accelerator.py", line 69, in train_or_test
    results = self.trainer.train()
  File "/home/myuser/miniconda3/envs/fsdl-text-recognizer-2021/lib/python3.6/site-packages/pytorch_lightning/trainer/trainer.py", line 495, in train
    self.run_sanity_check(self.get_model())
  File "/home/myuser/miniconda3/envs/fsdl-text-recognizer-2021/lib/python3.6/site-packages/pytorch_lightning/trainer/trainer.py", line 693, in run_sanity_check
    _, eval_results = self.run_evaluation(test_mode=False, max_batches=self.num_sanity_val_batches)
  File "/home/myuser/miniconda3/envs/fsdl-text-recognizer-2021/lib/python3.6/site-packages/pytorch_lightning/trainer/trainer.py", line 609, in run_evaluation
    output = self.evaluation_loop.evaluation_step(test_mode, batch, batch_idx, dataloader_idx)
  File "/home/myuser/miniconda3/envs/fsdl-text-recognizer-2021/lib/python3.6/site-packages/pytorch_lightning/trainer/evaluation_loop.py", line 178, in evaluation_step
    output = self.trainer.accelerator_backend.validation_step(args)
  File "/home/myuser/miniconda3/envs/fsdl-text-recognizer-2021/lib/python3.6/site-packages/pytorch_lightning/accelerators/gpu_accelerator.py", line 84, in validation_step
    return self._step(self.trainer.model.validation_step, args)
  File "/home/myuser/miniconda3/envs/fsdl-text-recognizer-2021/lib/python3.6/site-packages/pytorch_lightning/accelerators/gpu_accelerator.py", line 76, in _step
    output = model_step(*args)
  File "/mnt/c/Users/user/GitHub/fsdl-text-recognizer-2021-labs/lab1/text_recognizer/lit_models/base.py", line 58, in validation_step
    logits = self(x)
  File "/home/myuser/miniconda3/envs/fsdl-text-recognizer-2021/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/mnt/c/Users/user/GitHub/fsdl-text-recognizer-2021-labs/lab1/text_recognizer/lit_models/base.py", line 45, in forward
    return self.model(x)
  File "/home/myuser/miniconda3/envs/fsdl-text-recognizer-2021/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/mnt/c/Users/user/GitHub/fsdl-text-recognizer-2021-labs/lab1/text_recognizer/models/mlp.py", line 37, in forward
    x = self.fc1(x)
  File "/home/myuser/miniconda3/envs/fsdl-text-recognizer-2021/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/myuser/miniconda3/envs/fsdl-text-recognizer-2021/lib/python3.6/site-packages/torch/nn/modules/linear.py", line 93, in forward
    return F.linear(input, self.weight, self.bias)
  File "/home/myuser/miniconda3/envs/fsdl-text-recognizer-2021/lib/python3.6/site-packages/torch/nn/functional.py", line 1690, in linear
    ret = torch.addmm(bias, input, weight.t())
RuntimeError: CUDA error: no kernel image is available for execution on the device
@ammarasmro
Copy link
Author

Currently getting around it with

  1. Change cuda version in environment.yml
  2. Remove cudnn line from environment.yml
  3. After setting the labs up. Run this command conda install pytorch torchvision torchaudio cudatoolkit=11.0 -c pytorch

And lab1 passes. Not sure if it completely solves the problem though

@tranhoangkhuongvn
Copy link

I modified the below to make it work on my RTX3090 + Ubuntu 20:

  • remove both cuda and cudnn versions in environment.yml
  • after setting the labs up via make conda-update, run conda install -c anaconda cudatoolkit
  • finally run conda install pytorch torchvision torchaudio cudatoolkit=11.0 -c pytorch

@sunki-hong
Copy link

sunki-hong commented Mar 11, 2021

RTX3070 + Ubuntu 18.04

  • (if activated) conda deactivate

  • conda env remove -n fsdl-text-recognizer-2021

  • remove both cuda and cudnn versions in environment.yml as tranhoangkhuongvn mentioned

    • enviornment.yml will look like this
    name: fsdl-text-recognizer-2021
    channels:
      - defaults
    dependencies:
     - python=3.6  # Google Colab is still on Python 3.6
      - pip
      - pip:
        - pip-tools
    
    
  • make conda-update

  • conda activate fsdl-text-recognizer-2021

  • make pip-tools

  • conda install pytorch torchvision torchaudio cudatoolkit=11.1 -c pytorch -c conda-forge

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants