Code for my final project "GAN Mel Spectrogram Upscaling for Audio Super-Resolution" for cs395T - Deep Learning Seminar with Philipp Krähenbühl
Contains a modified version of BasicSR/ESRGAN and an iPython notebook to generate data.
Usage:
- Extract VCTK Dataset, adjust file path in notebook and generate data
- copy data to BasicSR/datasets/multispeaker
- modify BasicSR/options/train/ESRGAN/train_ESRGAN_MEL.yml/BasicSR/options/test/ESRGAN/test_ESRGAN_MEL.yml
- train using
python basicsr/train.py -opt options/train/ESRGAN/train_ESRGAN_MEL.yml - during training, you can listen to validataion outputs (follow code in notebook)
- batch generate upscaled audio files by putting 11025Hz files into BasicSR/inference_data and running
python basicsr/inference.py -opt options/test/ESRGAN/test_ESRGAN_MEL.yml
The samples directory contains audio samples randomly chosen from the test data. "_low_sr" files are the 11025Hz inputs to the model, "_label" files are the 44100Hz labels and "_pred" files are the network's predictions.