Skip to content

Transcribe desktop audio/computer audio in real-time and locally (Streaming ASR), using TorchAudio and Emformer-RNNT model for inference, PyAudio for reading stream, Tkinter for GUI.

Notifications You must be signed in to change notification settings

icynic/desktop-live-caption

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

desktop-live-caption

Transcribe desktop audio/computer audio in real-time and locally (Streaming ASR), using TorchAudio and Emformer-RNNT model.

  • Reimplemented stream reader using PyAudio rather than torchaudio.io. Thus no need for ffmpeg.
  • Asynchronous data transfer via a thread-safe queue.
  • Minimalist GUI to display transcriptions, built with Tkinter.

There are great models like OpenAI's Whisper to transcribe audio files, but they can't do it in real-time.

There are some apps which transcribe microphone audio, but they can't transcribe desktop audio.

This project, however, transcribes any video or audio which is playing on your computer in real-time.

But this project isn't perfect. There is only one available model: EMFORMER_RNNT_BASE_LIBRISPEECH, which only supports English. And it's accuracy isn't satisfying.

But still, it demonstrated one way to do such things.

About

Transcribe desktop audio/computer audio in real-time and locally (Streaming ASR), using TorchAudio and Emformer-RNNT model for inference, PyAudio for reading stream, Tkinter for GUI.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages