Open
Description
Differential Digital Signal Processing (DDSP) is a technique proposed by the Google Magenta team. [repo, papers].
It provides powerful generative methods that take advantage of the structure of sound based on Fourier analysis.
The paper has over 200 citations and is being applied to different tasks like speech synthesis and source separation.
TorchAudio is looking to support DDSP-based modeling methods by adding basic components.
We would like to hear and get help from the community on this work stream.
Initial Goal
To support DDSP-based generative modeling.
New Components / APIs
- Harmonic Synthesizer / Additive Synthesis
- Differentiable harmonic synthesizer
- Oscillator bank Add oscillator_bank #2848
- Harmonic overtones Add extend_pitch #2863
- ADSR envelope Add adsr_envelope #2859
- Differentiable harmonic synthesizer
- Subtractive Synthesis
- Filter design functions
- windowed-sinc filter Add sinc_impulse_response op #2875
- frequency method Add frequency_impulse_response #2879
- filter function Add filter_waveform #2928
- Filter design functions
Components for DDSP Modeling from [1]
- Multi-scale mel spectrogram loss
Used to train models. Compare the melspectrogram of the original waveform and synthesized waveform at multiple scales. [reference impl] - exp_sigmoid
Modified sigmoid function used to ensure that model output is in the value range expected by synthesizers [reference impl] - [Nice to have] Dataset
- NSynth
- Solo Violin
- [Stretch] DDSP Model from [1]
- AutoEncoder model for Timbre transfer
- [Stretch] Training Script and pre-trained model
Script to train models for- LJSpeech
- Solo violin
Existing Components / Workstream
- convolve and fftconvolve will be the basis for filter application.
- RIR simulation can be used for reverb.
Existing Tutorials
References
and some random papers moto found interesting