Skip to content

submarat/cyclegan-vc

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

49 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CycleGAN-VC

Implementation of Parallel-Data-Free Voice Conversion Using Cycle-Consistent Adversarial Networks (Kaneko & Kameoka, 2017).

Overview

This project implements voice conversion between speakers using CycleGAN without requiring parallel data. Generated samples are available here.

Implementation Notes

The model preserves the original speaker's prosody due to the convolution receptive field size, while log(f0) conversion handles pitch shifting effectively. Implementation choices were informed by this reference repository.

Setup

  1. Download training data
  2. Extract vcc2016_training and evaluation_all
  3. Set up Python environment:
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt

Usage

Preprocessing

Generate f0 statistics:

python3 src/preprocess.py --data_dir ./data/vcc2016_training/ \
                         --source_speaker SF1 \
                         --target_speaker TM3

Training

python3 src/train.py --resume_from_checkpoint False \
                     --checkpoint_dir SF1_TM3_checkpoints \
                     --source_speaker SF1 \
                     --target_speaker TM3 \
                     --source_logf0_mean 5.38879422525781 \
                     --source_logf0_std 0.2398814107162179 \
                     --target_logf0_mean 4.858265991213904 \
                     --target_logf0_std 0.23171982666578547

Additional Parameters

Both scripts support additional parameters - use --help for full options:

  • preprocess.py: Configure data directory and speakers
  • train.py: Set data directories, checkpoints, and f0 statistics

About

Paper Replication: CycleGAN-VC voice conversion

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published