Skip to content

Lantianyou/whisper-playground

Repository files navigation

whisper-playground

Transform your audio file into text, with one simple click.

Record verbally, get well-documented essays.

Whisper is meant to be a Web UI for the OpenAI Whisper. whisper-playground provides a comfortable, easy-to-use GUI to help people who has little technology background leverage today's AI development in speech recognization.

Use case: transcribe recorded audio file to 10x productivities for legal professions && read a podcast

Long term goal: an annotation as fine-tune tool. Fine tune is such a high-level, AI scientist needed, GPU intensive task. But fine tuning is the only way to deploy AI. AGI is movie star, fine tuned one is your girl friend.

For example, I have 20 audios talking about the internal stuff of a company called FooBar. On the first transcription, I annotate the word "foo bar" as "FooBar". The rest of the 19 audios should remember that

Problems and solutions

  • Speech to text super expensive or embedded in other software.

    • xunfei, tencent, super expensive, convoluted expensive
    • we can be 100x cheaper and still profitable, if not more
    • meeting apps, you have to use the app in advance. Cannot import files, say mp3 files.
  • Remove the pause and noise

  • Label the speechers for speaker in conversation

    • The output file is a simple srt, only timestamp and text, no who is saying what, no line segmentation, weird file format, inconvinient for people not tech savvy
  • Blank market

    • no 200 star repo for WebUI, in comparison, multiple ChatGPT UI for tens of thousands of stars
  • Big market

    • Can be used in legal case, recorded audio is wildly used as legal evidence, but no judge or lawyer has time to listen

Added Features

  • Human editting

    • Rich text editor for correction, can correct accent, 多音字 etc
    • Download as common-use file like Microsoft Word, pdf or display beautifully in markdown
  • Audio editting

    • Transform all kinds of audio file using ffmpeg
    • Split audio files
    • Remove background noise, and download for user, basically a WebUI for ffmpeg
  • Text augmentation

    • Use ChatGPT to frame the conversation from ordinary speech to written down essays. 口语到书面语
    • Summary, translation, style shifts (小红书) or other GPT capabilities

Advantages

  • Pricing: Just charge fees, one time payment, will accept below 1 yuan, no sign up allowed
  • Performant, elegant and easy-to-use, built by ByteDance engineers
  • SEO