self-hosted REST API (with frontend) to remove ads from audio/video files using OpenAI's Whisper and LLMs
A transcript is made with an API from Fireworks AI, running Whisper (specifically Whisper-v3-large-turbo), an open-source ASR model, which returns an entire transcription, and also word level timestamps, then the entire transcription is sent to an LLM (Gemini 2.0 Flash) to extract the entire advertisement segments, then the start_time and end_time of each segment is used to create an FFmpeg command to remove the segments from the original audio file, then return the cleaned audio file to the user.
Whisper is billed at $0.0009 per audio minute (billed per second), and Gemini 2.0 Flash is billed at $0.40 per million output tokens ($0.0000004 per token), so for an hour long podcast, the process is billed at around 0.11 USD.
Clone the repository and navigate to the directory:
git clone https://github.com/nocdn/ad-segment-trimmer.git
cd ad-segment-trimmer/
Copy the .env.example
file to .env
:
cp .env.example .env
- Make sure you have an Gemini API key, as an environment variable called
GEMINI_API_KEY
in the.env
file. - Make sure you have a Fireworks AI API key, as an environment variable called
FIREWORKS_API_KEY
in the.env
file. - Set any rate limits you want in the
.env
file (optional). - Build the and run the Docker image:
docker compose up -d --build
(the -d
flag runs the container in detached mode, and the --build
flag rebuilds the image if there are any changes)
There now should be a frontend running at port 6030
, and the API running at port 7070
.
To access the API, you can use the following curl command:
curl -F "file=@audio.mp3" -OJ http://localhost:7070/process
(replace audio.mp3
with the path to your audio file, the -OJ flag will save the file with the returned name with the _edited suffix)
- Python 3.10+
Clone the repository and navigate to the directory:
git clone https://github.com/nocdn/ad-segment-trimmer.git
cd ad-segment-trimmer/
Fill out the .env file by copying the .env.example file:
cp .env.example .env
- Make sure you have an Gemini API key, as an environment variable called
GEMINI_API_KEY
in the.env
file. - Make sure you have a Fireworks AI API key, as an environment variable called
FIREWORKS_API_KEY
in the.env
file. - Set any rate limits you want in the
.env
file (optional).
Install the dependencies:
cd backend
pip install -r requirements.txt
Run the backend:
python app.py
Install the dependencies:
cd frontend
npm install
Run the frontend:
npm run dev
This project is licensed under the MIT License - see the LICENSE file for details.