Skip to content

This repo is a toolkit designed for training AI models using Sovits-v4. If you are training with pure human voice audio, this is all you need to train your Sovits model. If you need to extract audio from songs or music videos and remove vocals, you will also need the UVR tool.

Notifications You must be signed in to change notification settings

Leo-Yuyang/sovits_tool

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

sovits-tool

This repo is a toolkit designed for training AI models using Sovits-v4. If you are training with pure human voice audio, this is all you need to train your Sovits model. If you need to extract audio from songs or music videos and remove accompaniment, you will also need the UVR(https://github.com/Anjok07/ultimatevocalremovergui) tool.

audioseg

The script audioseg.py segments all .wav files in the input folder based on pauses and creates several segments that contain only human speech. The duration of each .wav file is controlled to be between 10 to 15 seconds, meeting the training data requirements of Sovits-v4. Also, it dumps a .json that contains the periods of time in which the slice occours, in the following format:

{sample nº : [cut start, cut end]}. Ex.:

{"0": ["0:0:0", "0:0:3"], "1": ["0:0:3", "0:0:10"], "2": ["0:10:0", "0:0:22"], "3": ["0:0:22", "0:0:32"]}

The code was taken from /andrewphillipdoss. Thanks!

transform

This script includes format conversion tools for training and inference stages of sovits-v4 that may be used to convert various data formats such as mp3, wav, m4a, and so on.

filter_short_wav

Filter out all wav files in the specified folder that are less than 4 seconds in duration.

Python 3.11.0

numpy (1.24.1)

scypi (1.10.0)

tqdm (4.64.1)

uuid

loguru

moviepy

pydub

Usage

To run this code, just change the path of the input_dir and output_dir inside the code.

❗Ps: Please note that in order for your audio file to be cut into samples, it should contain periods of "silence".

Depending on the level of noise in your audio, the algorithm may skip the silence windows, resulting in missed cuts. Ensure that your audio is free from unwanted noise and that the silences are clearly defined. You can adjust the parameters of min_silence_length, silence_threshold, and step_duration to modify the length, amplitude, and duration of the silence window in order to better match your audio

About

This repo is a toolkit designed for training AI models using Sovits-v4. If you are training with pure human voice audio, this is all you need to train your Sovits model. If you need to extract audio from songs or music videos and remove vocals, you will also need the UVR tool.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages