DDSP components in TorchAudio

Differential Digital Signal Processing (DDSP) is a technique proposed by the Google Magenta team. [[repo](https://github.com/magenta/ddsp), [papers](https://github.com/magenta/ddsp/tree/main/ddsp/training/gin/papers)].

It provides powerful generative methods that take advantage of the structure of sound based on Fourier analysis.
The paper has over 200 citations and is being applied to different tasks like speech synthesis and source separation.

TorchAudio is looking to support DDSP-based modeling methods by adding basic components.
We would like to hear and get help from the community on this work stream.

## Initial Goal
To support DDSP-based generative modeling. 

## New Components / APIs
1. Harmonic Synthesizer / Additive Synthesis
    - Differentiable harmonic synthesizer
        - [x] Oscillator bank #2848  
        - [x] Harmonic overtones #2863
    - [x] ADSR envelope #2859
1. Subtractive Synthesis
    - Filter design functions
        - [x] windowed-sinc filter #2875
        - [x] frequency method #2879
    - [x] filter function #2928

## Components for DDSP Modeling from [1]
1. Multi-scale mel spectrogram loss
  Used to train models. Compare the melspectrogram of the original waveform and synthesized waveform at multiple scales. [[reference impl](https://github.com/magenta/ddsp/blob/9866d3aaa6d13e3b951d39d5979d3b29697067ea/ddsp/losses.py#L132)]
1. exp_sigmoid
Modified sigmoid function used to ensure that model output is in the value range expected by synthesizers [[reference impl](https://github.com/magenta/ddsp/blob/7cb3c37f96a3e5b4a2b7e94fdcc801bfd556021b/ddsp/core.py#L386-L404)]
1. [Nice to have] Dataset
    - [NSynth](https://magenta.tensorflow.org/datasets/nsynth)
    - Solo Violin
1. [Stretch] DDSP Model from [1]
    - AutoEncoder model for Timbre transfer
1. [Stretch] Training Script and pre-trained model
    Script to train models for
    - LJSpeech
    - Solo violin

## Existing Components / Workstream
- [convolve](https://pytorch.org/audio/main/prototype.functional.html#convolve) and [fftconvolve](https://pytorch.org/audio/main/prototype.functional.html#fftconvolve) will be the basis for filter application.
- [RIR simulation](https://github.com/pytorch/audio/pull/2644) can be used for reverb.

## Existing Tutorials
- [Filter design](https://pytorch.org/audio/main/tutorials/subtractive_synthesis_tutorial.html)
- [Subtractive synthesis](https://pytorch.org/audio/main/tutorials/subtractive_synthesis_tutorial.html)
- [Oscillator and ADSR envelope](https://pytorch.org/audio/main/tutorials/filter_design_tutorial.html)
- [Additive synthesis](https://pytorch.org/audio/main/tutorials/additive_synthesis_tutorial.html)

## References
1. [[2001.04643] DDSP: Differentiable Digital Signal Processing](https://arxiv.org/abs/2001.04643)

 and some random papers moto found interesting
1. [[2010.15084] Speech Synthesis and Control Using Differentiable DSP](https://arxiv.org/abs/2010.15084)
1. [[2202.00200] Differentiable Digital Signal Processing Mixture Model for Synthesis Parameter Extraction from Mixture of Harmonic Sounds](https://arxiv.org/abs/2202.00200)
1. [[2210.14476] Sinusoidal Frequency Estimation by Gradient Descent](https://arxiv.org/abs/2210.14476)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

DDSP components in TorchAudio #2835

Initial Goal

New Components / APIs

Components for DDSP Modeling from [1]

Existing Components / Workstream

Existing Tutorials

References

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

DDSP components in TorchAudio #2835

Description

Initial Goal

New Components / APIs

Components for DDSP Modeling from [1]

Existing Components / Workstream

Existing Tutorials

References

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions