English README / 👑Donate project / Discord
This is an extremely simple tool for separating vocals and background music, completely localized for web operation, using 2stems/4stems/5stems models.
Drag and drop a song or an audio/video file with background music into the local web page, and you can separate the vocals and music into separate audio wav files. You can choose to separate "piano sound," "bass sound," "drum sound," etc.
Automatically invoke the local browser to open the local web page, and the model is built-in, no need to connect to the external network to download.
Supports video (mp4/mov/mkv/avi/mpeg) and audio (mp3/wav) formats
Just two clicks of the mouse, one to select the audio/video file, and two to start processing.
vocal-english.mp4
-
Download the precompiled file from Releases on the right side.
-
After downloading, unzip it to a certain location, such as E:/vocal-separate.
-
Double-click
start.exe
, wait for the browser window to open automatically. -
Click on the upload area on the page, find the audio/video file you want to separate in the pop-up window, or drag the audio file directly to the upload area, and then click "Separate Now." Wait a moment, and at the bottom, each separated file and the playback control will be displayed. Click to play.
-
If the machine has an NVIDIA GPU and the CUDA environment is configured correctly, CUDA acceleration will be used automatically.
-
Requires python 3.9->3.11
-
Create an empty directory, such as E:/vocal-separate. Open a cmd window in this directory, the method is to enter
cmd
in the address bar, and then press Enter.Use git to pull the source code to the current directory
git clone git@github.com:jianchang512/vocal-separate.git .
-
Create a virtual environment
python -m venv venv
-
Activate the environment. On Windows, the command is
%cd%/venv/scripts/activate
, and on Linux and Mac, the command issource ./venv/bin/activate
-
Install dependencies:
pip install -r requirements.txt
-
On Windows, unzip ffmpeg.7z and place
ffmpeg.exe
andffprobe.exe
in the project directory. On Linux and Mac, download the corresponding version of ffmpeg from ffmpeg official website, unzip it, and place theffmpeg
andffprobe
binary programs in the project root directory. -
Download the model compression package, located in the root directory of the project in
pretrained_models
folder, and after extraction,pretrained_models
will be three folders namely 2steps/3steps/5steps -
Execute
python start.py
, and wait for the local browser window to open automatically.
api url: http://127.0.0.1:9999/api
Method: POST
Request params:
file: audio file
model: model name, 2stems,4stems,5stems
Response: json code:int, 0 succeed,>0 is error
msg:str, error infomation
data: List[str], all wav separate result, eg. ['http://127.0.0.1:9999/static/files/2/accompaniment.wav']
status_text: dict[str,str], every wav name, {'accompaniment.wav': 'accompaniment audio', 'bass.wav': 'bass audio', 'drums.wav': 'drums audio', 'other.wav': 'other audio', 'piano.wav': 'piano audio', 'vocals.wav': 'vocals audio'}
import requests
# api url
url = "http://127.0.0.1:9999/api"
files = {"file": open("C:\\Users\\c1\\Videos\\2.wav", "rb")}
data={"model":"2stems"}
response = requests.request("POST", url, timeout=600, data=data,files=files)
print(response.json())
{'code': 0, 'data': ['http://127.0.0.1:9999/static/files/2/accompaniment.wav', 'http://127.0.0.1:9999/static/files/2/vocals.wav'], 'msg': 'ok
', 'status_text': {'accompaniment.wav': 'accompaniment', 'bass.wav': 'bass', 'drums.wav': 'drums', 'other.wav': 'other', 'piano.wav': 'piano', 'vocals.wav': 'vocals'}}
Install CUDA Toolkit
If your computer has an Nvidia graphics card, upgrade the graphics card driver to the latest version, and then go to install the corresponding CUDA Toolkit 11.8 and cudnn for CUDA11.X.
After the installation is complete, press Win + R
, enter cmd
, and then press Enter. In the popped up window, enter nvcc --version
to confirm that there is version information displayed, similar to the picture
Then continue to enter nvidia-smi
, confirm that there is output information, and you can see the CUDA version number, similar to the picture
- For Chinese music or Chinese musical instruments, it is recommended to choose the
2stems
model. Other models can separately extract files for "piano, bass, and drums." - If the computer does not have an NVIDIA graphics card or has not configured the CUDA environment, do not choose the 4stems and 5stems models, especially when processing long-duration audio, otherwise, it may run out of memory.
This project mainly relies on other projects