AI-powered audio processing: transcribe speech, remove noise, and trim audio automatically.
Must have:
- Linux computer (Ubuntu/Debian recommended)
- Docker installed
Nice to have:
- NVIDIA GPU (makes it way faster)
Step 1: Install Docker
sudo apt update
sudo apt install docker.io docker-compose
sudo usermod -aG docker $USERLog out and back in after this step
Step 2: GPU Support (Optional but Recommended)
sudo apt install nvidia-container-toolkit
sudo systemctl restart dockerStep 3: Run the API
git clone <your-repo-url>
cd audio-api
mkdir models workspace temp
docker-compose up --buildThat's it! The API will be running at http://localhost:8000
- Transcribe - Converts speech to text with timestamps
- Denoise - Removes background noise from audio
- Clean - Does all of the above + trims audio to just the speech
Go to http://localhost:8000/docs in your browser for an easy web interface.
Or use these commands:
# Just transcribe
curl -X POST "http://localhost:8000/transcribe/" \
-F "file=@your-audio.wav" \
-F "expected_text=what you think it says"
# Clean everything (recommended)
curl -X POST "http://localhost:8000/clean/" \
-F "file=@your-audio.wav" \
-F "expected_text=what you think it says" \
-o cleaned-audio.wav- First time will be slow (downloads AI models ~2GB)
- Works with WAV, MP3, FLAC files
- GPU makes it 5-10x faster
- Without GPU it still works, just slower