Skip to content

[Bug] Speaker Diarization Crashes on RTX 5060 (RuntimeError: CUDA error: no kernel image is available) #403

@Agamemnon22

Description

@Agamemnon22

When attempting to run transcription with Speaker Diarization enabled, the process crashes immediately during the pyannote pipeline execution. The transcription itself (Whisper) works if diarization is disabled, but enabling speaker separation causes a hard failure.

It appears the bundled version of torch / pyannote does not yet support the NVIDIA 50-series architecture (Blackwell), as it is missing the necessary CUDA kernels.

To Reproduce

Open Memo AI on a machine with an NVIDIA GeForce RTX 5060.

Load a video/audio file.

Enable Speaker Diarization (set speakers to 2).

Start Transcription.

The process fails during the "Start speaker recognition" phase.

Expected Behavior
The software should diarize the audio using the GPU, as it does on 30-series and 40-series cards.

System Information

OS: Windows 11

Memo AI Version: v1.6.8

GPU: NVIDIA GeForce RTX 5060 Laptop GPU (8GB)

CPU: Intel Core Ultra 7 255H

Driver: Game Ready Driver (Latest)

Error Log (Relevant Snippet)
Plaintext

[info 20:40:37.726] Start pyannote: ... resources\transcribe.wav 0
[error 20:40:46.705] SyntaxError: Unexpected end of JSON input
...
[info 20:40:46.777] Command failed: ./main.exe ... --code
...
File "torch\nn\modules\rnn.py", line 209, in flatten_parameters
RuntimeError: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions