Getting the top few transcription results #478
-
Hi, Does anyone know how to find a few possible transcription results, instead of just the one transcription result? So if I transcribe some audio that speaks a sentence, I could receive a few text sentences with the top few guesses of the transcription (allowing me to manually choose which sentence is the most correct, instead of whisper determining the best sentence)? I guess it might be a little related to word confidence scores mentioned in #284 but it would still be different. |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 7 replies
-
The Lines 79 to 80 in 9f70a35 but these will select the one best candidate. Their difference is:
So the easiest way would be to repeat the call to Lines 160 to 166 in 9f70a35 This defaults to |
Beta Was this translation helpful? Give feedback.
-
@shervinemami I managed to implement this functionality. To make it work I had to modify the source code of whisper/transcribe.py ->
So To achieve this output, I also modified whisper/decoding.py by:
Hope this helps! |
Beta Was this translation helpful? Give feedback.
The
best_of
orbeam_size
option is designed to do something similar to this:whisper/whisper/decoding.py
Lines 79 to 80 in 9f70a35
but these will select the one best candidate. Their difference is:
best_of
selects multiple random samples, so it only makes sense with a nonzero temperature and will tend to generate more diverse (i.e. more likely to be wrong) samples.beam_size
selects the best candidates out of beam search, ranked by the likelihood. These candidates tend to be only slightly different.So t…