-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ways to transcribe real-time/detect the end of speaking in indefinite file? #151
Comments
@kingcharlezz, You may have a look at this project: |
I user faster-whisper real time livestream (so infinite duration) and it works great. (Actually more than great, I can actually run two large faster-whisper models simultaneously and get both transcription and translation, it's so fast!) For the vad, you can pass in vad_filter=True and by default will break look for 2 second silences. (min_silence_duration_ms = 2000) Also check out the non Vad no_speech_threshold and log_prob_threshold options. More specific vad options, from vad.py, you just pass these exactly the name names to faster-whisper the values get pass through to the vad:
For livestreams the biggest bottleneck in my opinion, after the VAD, is noise reduction. I pipe the live audio through OBS using NVIDIA noise reduction filter before sending it to faster whisper. It's a day or night difference in Whisper performance on audio with lots of background music or noise. For phone calls you can probably get away without doing that though. |
Appreciate this! seems to accomplish what I need it to do. Thanks for the in-depth responses. |
Hi @JonathanFly, could you please give more info on how you proceed to use this "real time livestream" with infinite duration, please? |
I threw it up here: https://github.com/JonathanFly/faster-whisper-livestream-translator I kind of left it in a not great state though, but you can get the idea. It's a messy fork of https://github.com/fortypercnt/stream-translator |
Hello all. I am working on a project pertaining to ASR in phone calls. After being dissatisfied with some of the commercial options, I wanted to try this. Is there a built in way to know when the other party is not talking? or something like whisper.cpp's stream function? I have seen mention of VAD in the docs this but I am not sure how to elegantantly implement this into my problem. Any comments are appreciated.
Thanks.
The text was updated successfully, but these errors were encountered: