This repo is the implementation of the GPT-SoVITS model in MindSpore, reference to the implementation by RVC-BOSS
-
Zero-shot TTS: Input a 5-second vocal sample and experience instant text-to-speech conversion.
-
Few-shot TTS: Fine-tune the model with just 1 minute of training data for improved voice similarity and realism.
-
Cross-lingual Support: Inference in languages different from the training dataset, currently supporting English, Japanese, and Chinese.
-
WebUI Tools(TODO): Integrated tools include automatic training set segmentation, Chinese ASR, and text labeling, assisting beginners in creating training datasets and GPT/SoVITS models.
- Python 3.9, Mindspore 2.2.3, CU116
conda create -n GPTSoVits python=3.9
conda activate GPTSoVits
bash install.sh
pip install -r requirements.txt
conda install ffmpeg
sudo apt install ffmpeg
sudo apt install libsox-dev
conda install -c conda-forge 'ffmpeg<7'
You can use the model conversion tool GPT_SoVITS/convert.py
to transform PyTorch model weights into MindSpore model weights.
cd GPT-SoVITS-mindspore
python GPT_SoVITS/convert.py --g_path path_to_your_GPT_model \
--s_path path_to_your_Sovits_model \
Download pretrained models from GPT-SoVITS Models and place them in GPT_SoVITS/pretrained_models
.
You can use a startup script:
cd GPT-SoVITS-mindspore
bash launch_webui.sh
Or directly launch the Python file:
cd GPT-SoVITS-mindspore
python GPT_SoVITS/inference_webui.py
Upload a clip for reference audio (must be 3-10 seconds) then fill in the Text for reference audio, which is basically what does the character say in the audio. Choose the language on the right.
The reference audio is very important as it determines the speed and the emotion of the output. Please try different ones if you did not get your desired output.
Fill the inference text and set the inference language, then click Start inference.