Description
Since a few months ago I was using CTranslate2 with CUDA 11 on a GTX 960M.
Recently I tried to update to CUDA 12, which is still supported for GTX 960M, and CTranslate2.
I expected the update to work, since documentation still reports compatibility with Compute Capability 3.5 (https://opennmt.net/CTranslate2/hardware_support.html) and I had no major issue updating PyTorch.
Unfortunately this was not the case, since I started receiving:
RuntimeError: parallel_for failed: cudaErrorNoKernelImageForDevice: no kernel image is available for execution on the device
Apparently this is a common issue, since I saw other related issues from users of GTX 9xx GPUs:
SYSTRAN/faster-whisper#806
https://forums.developer.nvidia.com/t/runtimeerror-parallel-for-failed-cudaerrornokernelimagefordevice-no-kernel-image-is-available-for-execution-on-the-device/291404
m-bain/whisperX#794
I tried to compile the code with Compute Cabability 5.0, but this was not enough, since some code was introduced that required Compute Capability 5.3 (which is not supported on my GPU), thus I disabled that code and recompiled.
After that I was able to run through quickstart, using cuda instead of cpu.
I was also able to run faster-whisper.
Is it possible to reintroduce support for Compute Capablity 5.0 in distributed wheels?
If yes I will be happy to provide a pull request.