whisper-playground

Transform your audio file into text, with one simple click.

Record verbally, get well-documented essays.

Whisper is meant to be a Web UI for the OpenAI Whisper. whisper-playground provides a comfortable, easy-to-use GUI to help people who has little technology background leverage today's AI development in speech recognization.

Use case: transcribe recorded audio file to 10x productivities for legal professions && read a podcast

Long term goal: an annotation as fine-tune tool. Fine tune is such a high-level, AI scientist needed, GPU intensive task. But fine tuning is the only way to deploy AI. AGI is movie star, fine tuned one is your girl friend.

For example, I have 20 audios talking about the internal stuff of a company called FooBar. On the first transcription, I annotate the word "foo bar" as "FooBar". The rest of the 19 audios should remember that

Problems and solutions

Speech to text super expensive or embedded in other software.
- xunfei, tencent, super expensive, convoluted expensive
- we can be 100x cheaper and still profitable, if not more
- meeting apps, you have to use the app in advance. Cannot import files, say mp3 files.
Remove the pause and noise
Label the speechers for speaker in conversation
- The output file is a simple srt, only timestamp and text, no who is saying what, no line segmentation, weird file format, inconvinient for people not tech savvy
Blank market
- no 200 star repo for WebUI, in comparison, multiple ChatGPT UI for tens of thousands of stars
Big market
- Can be used in legal case, recorded audio is wildly used as legal evidence, but no judge or lawyer has time to listen

Added Features

Human editting
- Rich text editor for correction, can correct accent, 多音字 etc
- Download as common-use file like Microsoft Word, pdf or display beautifully in markdown
Audio editting
- Transform all kinds of audio file using ffmpeg
- Split audio files
- Remove background noise, and download for user, basically a WebUI for ffmpeg
Text augmentation
- Use ChatGPT to frame the conversation from ordinary speech to written down essays. 口语到书面语
- Summary, translation, style shifts (小红书) or other GPT capabilities

Advantages

Pricing: Just charge fees, one time payment, will accept below 1 yuan, no sign up allowed
Performant, elegant and easy-to-use, built by ByteDance engineers
SEO

Name		Name	Last commit message	Last commit date
Latest commit History 79 Commits
.idea		.idea
.vscode		.vscode
apps		apps
gpu		gpu
.gitignore		.gitignore
README.md		README.md
deck.md		deck.md
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
pnpm-workspace.yaml		pnpm-workspace.yaml
renovate.json		renovate.json
tech.md		tech.md
turbo.json		turbo.json
vision-zh.md		vision-zh.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

whisper-playground

Problems and solutions

Added Features

Advantages

About

Releases

Packages

Contributors 2

Languages

Lantianyou/whisper-playground

Folders and files

Latest commit

History

Repository files navigation

whisper-playground

Problems and solutions

Added Features

Advantages

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages