Replies: 2 comments
-
A little bit of digging and I'm going to assume that it's because WhisperModel from faster_whisper is basically 'faster' than the version inside of WhisperX ?? Secondarily, any thoughts on if faster_whisper (as implemented) is better than the HuggingFace implementation for long form audio (more than 5-minute files) |
Beta Was this translation helpful? Give feedback.
-
the |
Beta Was this translation helpful? Give feedback.
-
Right now it seems that whisperX is being used exclusively for the better alignment it provides via the Wav2Vec alignment model.
Is there a reason whisperX also isn't used to generate the initial transcription? (via whisperX.transcribe() which seems to return the same result Dict as whisper.transcribe() )
Beta Was this translation helpful? Give feedback.
All reactions