Transform your audio file into text, with one simple click.
Record verbally, get well-documented essays.
Whisper is meant to be a Web UI for the OpenAI Whisper. whisper-playground provides a comfortable, easy-to-use GUI to help people who has little technology background leverage today's AI development in speech recognization.
Use case: transcribe recorded audio file to 10x productivities for legal professions && read a podcast
Long term goal: an annotation as fine-tune tool. Fine tune is such a high-level, AI scientist needed, GPU intensive task. But fine tuning is the only way to deploy AI. AGI is movie star, fine tuned one is your girl friend.
For example, I have 20 audios talking about the internal stuff of a company called FooBar. On the first transcription, I annotate the word "foo bar" as "FooBar". The rest of the 19 audios should remember that
-
Speech to text super expensive or embedded in other software.
- xunfei, tencent, super expensive, convoluted expensive
- we can be 100x cheaper and still profitable, if not more
- meeting apps, you have to use the app in advance. Cannot import files, say mp3 files.
-
Remove the pause and noise
-
Label the speechers for speaker in conversation
- The output file is a simple srt, only timestamp and text, no who is saying what, no line segmentation, weird file format, inconvinient for people not tech savvy
-
Blank market
- no 200 star repo for WebUI, in comparison, multiple ChatGPT UI for tens of thousands of stars
-
Big market
- Can be used in legal case, recorded audio is wildly used as legal evidence, but no judge or lawyer has time to listen
-
Human editting
- Rich text editor for correction, can correct accent, 多音字 etc
- Download as common-use file like Microsoft Word, pdf or display beautifully in markdown
-
Audio editting
- Transform all kinds of audio file using ffmpeg
- Split audio files
- Remove background noise, and download for user, basically a WebUI for ffmpeg
-
Text augmentation
- Use ChatGPT to frame the conversation from ordinary speech to written down essays. 口语到书面语
- Summary, translation, style shifts (小红书) or other GPT capabilities
- Pricing: Just charge fees, one time payment, will accept below 1 yuan, no sign up allowed
- Performant, elegant and easy-to-use, built by ByteDance engineers
- SEO