Skip to content

Pyroghy/audio-qc-api

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Audio QC API

AI-powered audio processing: transcribe speech, remove noise, and trim audio automatically.

What You Need

Must have:

  • Linux computer (Ubuntu/Debian recommended)
  • Docker installed

Nice to have:

  • NVIDIA GPU (makes it way faster)

Setup

Step 1: Install Docker

sudo apt update
sudo apt install docker.io docker-compose
sudo usermod -aG docker $USER

Log out and back in after this step

Step 2: GPU Support (Optional but Recommended)

sudo apt install nvidia-container-toolkit
sudo systemctl restart docker

Step 3: Run the API

git clone <your-repo-url>
cd audio-api
mkdir models workspace temp
docker-compose up --build

That's it! The API will be running at http://localhost:8000

What It Does

  • Transcribe - Converts speech to text with timestamps
  • Denoise - Removes background noise from audio
  • Clean - Does all of the above + trims audio to just the speech

How to Use

Go to http://localhost:8000/docs in your browser for an easy web interface.

Or use these commands:

# Just transcribe
curl -X POST "http://localhost:8000/transcribe/" \
  -F "file=@your-audio.wav" \
  -F "expected_text=what you think it says"

# Clean everything (recommended)
curl -X POST "http://localhost:8000/clean/" \
  -F "file=@your-audio.wav" \
  -F "expected_text=what you think it says" \
  -o cleaned-audio.wav

Important Notes

  • First time will be slow (downloads AI models ~2GB)
  • Works with WAV, MP3, FLAC files
  • GPU makes it 5-10x faster
  • Without GPU it still works, just slower

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published