GitHub - powellsz/Realtime-Voice-Clone-Chinese: 🚀AI拟声: 5秒内克隆您的声音并生成任意语音内容 Clone a voice in 5 seconds to generate arbitrary speech in real-time

This repository is forked from Real-Time-Voice-Cloning which only support English.

English | 中文

Features

🌍 Chinese supported mandarin and tested with multiple datasets: aidatatang_200zh, magicdata

🤩 PyTorch worked for pytorch, tested in version of 1.9.0(latest in August 2021), with GPU Tesla T4 and GTX 2060

🌍 Windows + Linux tested in both Windows OS and linux OS after fixing nits

🤩 Easy & Awesome effect with only newly-trained synthesizer, by reusing the pretrained encoder/vocoder

DEMO VIDEO

Quick Start

1. Install Requirements

Follow the original repo to test if you got all environment ready. **Python 3.7 or higher ** is needed to run the toolbox.

Install PyTorch.
Install ffmpeg.
Run pip install -r requirements.txt to install the remaining necessary packages.

Note that we are using the pretrained encoder/vocoder but synthesizer, since the original model is incompatible with the Chinese sympols. It means the demo_cli is not working at this moment.

2. Train synthesizer with your dataset

Download aidatatang_200zh or SLR68 dataset and unzip: make sure you can access all .wav in train folder
Preprocess with the audios and the mel spectrograms: python synthesizer_preprocess_audio.py <datasets_root> Allow parameter --dataset {dataset} to support adatatang_200zh, magicdata
Preprocess the embeddings: python synthesizer_preprocess_embeds.py <datasets_root>/SV2TTS/synthesizer
Train the synthesizer: python synthesizer_train.py mandarin <datasets_root>/SV2TTS/synthesizer
Go to next step when you see attention line show and loss meet your need in training folder synthesizer/saved_models/.

FYI, my attention came after 18k steps and loss became lower than 0.4 after 50k steps. A link to my early trained model: Baidu Yun Code：aid4

3. Launch the Toolbox

You can then try the toolbox:

python demo_toolbox.py -d <datasets_root>
or
python demo_toolbox.py

Good news🤩: Chinese Characters are supported

TODO

Add demo video
Add support for more dataset
Upload pretrained model
Support parallel tacotron
Service orianted and docterize
🙏 Welcome to add more

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
archived_untest_files		archived_untest_files
encoder		encoder
samples		samples
synthesizer		synthesizer
toolbox		toolbox
utils		utils
vocoder		vocoder
.gitattributes		.gitattributes
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
LICENSE.txt		LICENSE.txt
README-CN.md		README-CN.md
README.md		README.md
demo_toolbox.py		demo_toolbox.py
requirements.txt		requirements.txt
synthesizer_preprocess_audio.py		synthesizer_preprocess_audio.py
synthesizer_preprocess_embeds.py		synthesizer_preprocess_embeds.py
synthesizer_train.py		synthesizer_train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Features

DEMO VIDEO

Quick Start

1. Install Requirements

2. Train synthesizer with your dataset

3. Launch the Toolbox

TODO

About

Releases

Packages

Languages

License

powellsz/Realtime-Voice-Clone-Chinese

Folders and files

Latest commit

History

Repository files navigation

Features

DEMO VIDEO

Quick Start

1. Install Requirements

2. Train synthesizer with your dataset

3. Launch the Toolbox

TODO

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages