Implementation of Parallel-Data-Free Voice Conversion Using Cycle-Consistent Adversarial Networks (Kaneko & Kameoka, 2017).
This project implements voice conversion between speakers using CycleGAN without requiring parallel data. Generated samples are available here.
The model preserves the original speaker's prosody due to the convolution receptive field size, while log(f0) conversion handles pitch shifting effectively. Implementation choices were informed by this reference repository.
- Download training data
- Extract vcc2016_training and evaluation_all
- Set up Python environment:
python -m venv venv
source venv/bin/activate
pip install -r requirements.txtGenerate f0 statistics:
python3 src/preprocess.py --data_dir ./data/vcc2016_training/ \
--source_speaker SF1 \
--target_speaker TM3python3 src/train.py --resume_from_checkpoint False \
--checkpoint_dir SF1_TM3_checkpoints \
--source_speaker SF1 \
--target_speaker TM3 \
--source_logf0_mean 5.38879422525781 \
--source_logf0_std 0.2398814107162179 \
--target_logf0_mean 4.858265991213904 \
--target_logf0_std 0.23171982666578547Both scripts support additional parameters - use --help for full options:
preprocess.py: Configure data directory and speakerstrain.py: Set data directories, checkpoints, and f0 statistics