Testing short transcriptions speed issue #956
Unanswered
timwillhack
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hello, I was trying to put faster-whisper in place of openai-whisper in my project to get speed gains. I am mostly transcribing small wav files (1-5 second) and for some reason, the openai version of whisper is running faster than faster-whisper.
I'm running on gpu (3080).
I didn't want to open an issue about this because I most likely am missing something (like how the models compare between openai whisper and faster-whisper.
Here is my sample code that transcribes the same 2 second audio file 10 times for each version of whisper:
Sorry edited a bunch of times trying to get the code block to format but its not:
from faster_whisper import WhisperModel#, BatchedInferencePipeline model_size = "base" print("loading: " + model_size) audio_model = WhisperModel(model_size, device="cuda", compute_type="float16") for i in range(1, 10): start_timer() segments, info = audio_model.transcribe("trans_cleanup_20240810_215604_822844.mp3", beam_size=5,language="en",temperature=0,compression_ratio_threshold=None,log_prob_threshold=None,no_speech_threshold=None) print("Detected language '%s' with probability %f" % (info.language, info.language_probability)) for segment in segments: print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text)) end_timer() import whisper #using original openai whisper (base) model = "base" if args.model != "large" and not args.non_english: model = model + ".en" print("loading: " + model) audio_model = whisper.load_model(model) decoding_options = DecodingOptions(temperature=0, language="en", fp16=torch.cuda.is_available()) transcribe_params = { "no_speech_threshold": None, "compression_ratio_threshold": None, "logprob_threshold": None#, } all_params = {**vars(decoding_options), **transcribe_params} for i in range(1, 10): start_timer() result = audio_model.transcribe("trans_cleanup_20240810_215604_822844.mp3",**all_params) end_timer()
This is what is output (top is faster-whisper):
`loading: base.en
Detected language 'en' with probability 1.000000
[0.00s -> 2.00s] I'm gonna fly to the moon
Elapsed time for 'default': 477.35 ms
Detected language 'en' with probability 1.000000
[0.00s -> 2.00s] I'm gonna fly to the moon
Elapsed time for 'default': 236.15 ms
Detected language 'en' with probability 1.000000
[0.00s -> 2.00s] I'm gonna fly to the moon
Elapsed time for 'default': 230.88 ms
Detected language 'en' with probability 1.000000
[0.00s -> 2.00s] I'm gonna fly to the moon
Elapsed time for 'default': 228.61 ms
Detected language 'en' with probability 1.000000
[0.00s -> 2.00s] I'm gonna fly to the moon
Elapsed time for 'default': 232.33 ms
Detected language 'en' with probability 1.000000
[0.00s -> 2.00s] I'm gonna fly to the moon
Elapsed time for 'default': 235.58 ms
Detected language 'en' with probability 1.000000
[0.00s -> 2.00s] I'm gonna fly to the moon
Elapsed time for 'default': 233.37 ms
Detected language 'en' with probability 1.000000
[0.00s -> 2.00s] I'm gonna fly to the moon
Elapsed time for 'default': 231.14 ms
Detected language 'en' with probability 1.000000
[0.00s -> 2.00s] I'm gonna fly to the moon
Elapsed time for 'default': 231.20 ms
loading: base.en
Elapsed time for 'default': 702.27 ms
Elapsed time for 'default': 180.99 ms
Elapsed time for 'default': 206.93 ms
Elapsed time for 'default': 184.45 ms
Elapsed time for 'default': 182.56 ms
Elapsed time for 'default': 181.67 ms
Elapsed time for 'default': 191.69 ms
Elapsed time for 'default': 182.79 ms
Elapsed time for 'default': 179.49 ms`
Beta Was this translation helpful? Give feedback.
All reactions