Acapella-Powered Song Recaster

Recreate any song using only mouth-made sounds through an end-to-end modular audio pipeline.

Why?

I’ve always been a huge fan of acapella, and I wondered: is it possible to recreate any song using only mouth-made sounds? Well this project answers that question.

Pipeline

This project builds a pipeline that:

Ingests hundreds of acapella “instrumental” tracks
Slices them into short segments
Embeds each slice with a pretrained audio model
Indexes those embeddings with FAISS
Converts any input song by matching its instrumental sections to your acapella library, time-stretching and stitching together a brand-new acapella “instrumental,” then layering back the original vocals

Features

Modular pipeline: clear separation of data ingestion, embedding, matching, synthesis, and orchestration
Pretrained embeddings: uses Wav2Vec2 for robust audio features
Fast nearest-neighbor search with FAISS
Time-stretching & cross-fade assembly to avoid pops and align durations
CLI tools for ingestion, index building/updating, and conversion

Repo Structure

audiorecast/
├── README.md
├── pyproject.toml
├── src/
│   └── audiorecast/
│       ├── data/
│       │   ├── downloader.py           # YouTube → WAV
│       │   ├── stem_splitter.py        # Demucs wrapper & 4→2 stem merge
│       │   └── segmenter.py            # Chop stems into fixed windows
│       ├── embedding/
│       │   └── extractor.py            # Wav2Vec2 feature extractor
│       ├── matching/
│       │   ├── index_builder.py        # Build embeddings.npy + FAISS index
│       │   └── matcher.py              # Query FAISS for top-k matches
│       ├── synthesis/
│       │   ├── synthesizer.py          # Time-stretch pads slices
│       │   └── assembler.py            # Cross-fade & concatenate
│       └── pipeline.py                 # End-to-end conversion logic
├── scripts/
│   ├── ingest_acapellas.py             # Download, split, slice
│   ├── build_faiss_index.py            # Full rebuild
│   └── convert_song.py                 # Given input → acapella-style output
└── data/
    ├── ingest/                         # .txt playlists of YouTube URLs
    ├── raw/                            # Downloaded WAVs + stems + slices
    └── embeddings/                     # embeddings.npy, paths.jsonl, .faiss

Demo

Input Song: 🎧 Stereo_Hearts_acapella.mp3
Output (acapella-recast): 🎤 Stereo_Hearts_acapella.wav

Spectrogram comparison:

Obviously, the results aren’t perfect, yet. With a larger acapella database, better matching heuristics (re-ranking matches with cosine similarity + timing loss), and future improvements in synthesis, the output quality will significantly improve.

This is just the starting point, there's a lot of exciting room to grow.

Docker Support

You can run this project directly via Docker:

Build locally:

docker build -t audiorecast .

Installation Instructions

Clone the repo

git clone https://github.com/Codingisinmyblud/audiorecast.git && cd audiorecast

Install Dependencies

Install Poetry (if you don't have it yet):
https://python-poetry.org/docs/#installation
Install project dependencies:

poetry install

Tech Stack

Built With

Python 3.12
FAISS
Wav2Vec2
Demucs

Future Improvements

This project is just the beginning. Here are some directions I plan to explore:

Expand the acapella database for more diverse sounds
Improve matching using music-specific audio embeddings (e.g. CLAP or MERT)
Add beat-aware slicing and better tempo alignment (dynamic time warping, etc.)
Explore smarter re-ranking (combine cosine similarity with timing constraints)
Speed up pipeline using multiprocessing or batch inference
Build a web interface for real-time previews and remixing

Have more ideas? Open an issue or PR, contributions are welcome!

Contributing

Pull requests welcome. For major changes, open an issue first.

Fork the repo
Create your feature branch (git checkout -b feature/foo)
Commit your changes (git commit -m 'Add foo')
Push and create a PR

License

This project is licensed under the MIT License. See LICENSE for more info.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
data		data
scripts		scripts
src/audiorecast		src/audiorecast
tests		tests
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Acapella-Powered Song Recaster

Why?

Pipeline

Features

Repo Structure

Demo

Docker Support

Installation Instructions

Clone the repo

Install Dependencies

Tech Stack

Built With

Future Improvements

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

Codingisinmyblud/audiorecast

Folders and files

Latest commit

History

Repository files navigation

Acapella-Powered Song Recaster

Why?

Pipeline

Features

Repo Structure

Demo

Docker Support

Installation Instructions

Clone the repo

Install Dependencies

Tech Stack

Built With

Future Improvements

Contributing

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages