Note: The latest version of the code should be capable of loading the old checkpoints (under
ckpts/ismir23/
), but the training for the v1 vocoder is not guaranteed to work. We're working on it. If you want to use the old code base that was made for the ISMIR 2023 paper, please checkout the ismir23 tag or commit6d323da
.
- Download the MPop600 dataset. The dataset is conducted in a download-by-request manner. Please contact their third author Yi-Jhe Lee to get the raw files.
- Run the following command to resample the dataset to 24 kHz wave files.
python scripts/resample_dir.py **/f1/ output_dir --sr 24000
- Extract the foundamental frequency (F0). The f0s will be saved as
.pv
file in the same directory with the original files using 5 ms hop size.
python scripts/wav2f0.py output_dir
(The following command has not been tested yet. See the note above.)
python main.py fit --config config.yaml --dataset.wav_dir output_dir
python main.py test --config config.yaml --ckpt_path checkpoint.ckpt --data.duration 6 --data.overlap 0 --data.batch_size 16 --trainer.logger false
First, store the synthesised waveforms in a directory.
python autoencode.py predict -c config.yaml --ckpt_path checkpoint.ckpt --trainer.logger false --seed_everything false --data.wav_dir output_dir --trainer.callbacks+=ltng.cli.MyPredictionWriter --trainer.callbacks.output_dir pred_dir
Make a new directory and copy the first three files of the f1
(or m1
), like the following:
mpop600-test-f1
├── f1
│ ├── 001.wav
│ ├── 002.wav
│ └── 003.wav
Please also put your synthesised wavs under pred_dir/f1
to make it compatible with the script fad.py
.
Then, run the following command to calculate the FAD score.
python fad.py mpop600-test-f1 pred_dir --model vggish
Due to incremental changes and improvements I made since 2023, the evaluation results (MSSTFT and MAE-f0) maybe slightly different from the ones reported in the ISMIR paper.
Since we changed to use fadtk
for the FAD evaluation, we report the new FAD scores here.
Model | FAD |
---|---|
DDSP | |
SawSing | |
GOLF | |
PULF |
Model | FAD |
---|---|
DDSP | |
SawSing | |
GOLF | |
PULF |
python test_rtf.py config.yaml checkpoint.ckpt test.wav
Note: Please use the checkpoints with
*converted*
in the name for the latest version of the code.
The belows are commands to extract intermediate features from the vocoders for the ablation study. Please check this notebook for some analysis on the extracted features.
python harm_and_noise.py ckpts/ismir23/*/config.yaml ckpts/ismir23/*_f1/*_converted.ckpt dir/to/f1 result_dir
The above command will save the harmonic and noise components of the test set of f1 , predicted by the vocoder with a frame size of 6 seconds and one second overlap, in result_dir
.
python biquads.py ckpts/ismir23/{glottal_d_*, pulse_*}/config.yaml ckpts/ismir23/{glottal_d_*, pulse_*}/*_converted.ckpt dir/to/{f1, m1} result.pt
The above command will save the biquad coefficients of LPC from the test set of either f1 or m1, predicted by the vocoder with a frame size of 6 seconds without overlap, in result.pt
that can be loaded by PyTorch.
- Compute MUSHRA scores given the rating file from GO Listen
- Time-domain l2 experiment
- TISMIR ablation study
If you find this code useful, please consider citing the following paper:
@inproceedings{ycy2023golf,
title = {Singing Voice Synthesis Using Differentiable LPC and Glottal-Flow-Inspired Wavetables},
author = {Yu, Chin-Yun and Fazekas, György},
booktitle={Proc. International Society for Music Information Retrieval},
year={2023},
pages={667--675}
}