[BC] Standardization of Transforms/Functionals #152

jamarshon · 2019-07-18T15:53:08Z

tl;dr Signals will be (channels, time) (c, n). spectrograms will be (channels, freq, time) (c, f, t) and various transformations will be similar format with channel as first dimension and time as last dimension e.g (c, n_mels, t) or (c, n_mfcc, t)

Removal of Legacy Transforms

-Compose: From #110, "we wouldn't need it because once things are based on Layers people can simply build a nn.Sequential()."
-Aliases and deprecated modules: SPECTROGRAM, F2M, MEL If we are going to start breaking things in this new release, then it makes sense to do as much as possible so that the future stable versions do not have technical debt
-BLC2CBL: there is no input/output to this library that uses BLC or CBL format (going to keep LC2CL mainly because all the inputs are (c, n) meaning it would be convenient for users to switch this format).

Channel x Time

The input (some outputs) of all transforms/functions would be (c, n) e.g Spectrogram, MFCC, MelSpectrogram, Resample, etc. This is to be consistent with PyTorch which has channel followed by the number of samples.
This means removing the channel dimension flags that some transforms were using which could accept either (n, c) or (c, n).

Channel x Frequency x Time

The output of stft is (c, f, t, 2) meaning for each channel, the columns are the fft of a certain window so as we travel horizontally we can see each column (fft) change over time. This matches the output of librosa so we no longer need to transpose in our test comparisions with spectrogram, melscale, and mfcc. Previously it was time x frequency.

DCT and Filterbank matrices

Going to keep them as they are. These matrices should be shaped so that they can be multiplied without a transpose e.g for filterbank (..., n_freqs) x (n_freqs, n_mels) meaning each column is a filterbank and DCT (..., n_mels) x (n_mels, n_mfcc) meaning each column is a cos function

Misc

Going to keep scale, padtrim, downmixmono as while they are simple, they could be operations that are performed often so it would be convenient to have the code to reduce amount of code the user has to write as well as provides documentation as the name of the transforms/functions communicates the intent clearly

cpuhrsch · 2019-07-18T17:23:50Z

@keunwoochoi @ksanjeevan please let us know what you think. This is going to resolve a lot of discussion :)

ksanjeevan · 2019-07-18T20:31:19Z

@cpuhrsch on the dimension side all great 👍. There still will be much to discuss on some variable/transform names I guess.

keunwoochoi · 2019-07-18T21:05:29Z

Great! All the dimension configs are aligned with what we've discussed in the contrib repo :)

If the var names were an issue, we reached at this conclusion if you wanna borrow some idea @cpuhrsch @jamarshon. They're probably slightly more verbose than usual, and that's what we wanted so that even beginners in either Pytorch or audio processing would get it only by reading the code.

test/test_transforms.py

cpuhrsch

@jamarshon please look into the mentioned standardization of input flags and add a discussion / changes to this PR.

jamarshon · 2019-07-22T21:50:24Z

In terms of variable name standardization, some concepts were taken from keunwoochoi/torchaudio-contrib#36 (comment), mainly:

waveform: a tensor of audio samples
sample_rate: the rate of audio samples (samples per second)
specgram: a tensor of spectrogram
mel_specgram: a mel spectrogram
hop_length: the number of samples between the starts of consecutive frames
n_freqs: the number of bins in a linear spectrogram (i.e., n_freqs == n_fft // 2 + 1)
min_freq: the lowest frequency of the lowest band in a mel/CQT spectrogram
max_freq: the highest frequency of the highest band in a mel/CQT spectrogram

Other variable names
n_fft: the number of fourier bins (matches torch.stft naming)
win_length: the length of the STFT window (matches torch.stft naming)
n_mfcc, n_mels: just to be consistent with other n_* variables
window_fn: for functions that creates windows e.g. torch.hann_window. This is because torch.stft considers window to be a tensor so having window being a function as well would be confusing so instead renaming it to be window_fn to make it clear that it produces window

Currently there is no complex tensors anywhere so no complex_specgram. Also no batching so no waveforms, specgrams, etc.

jamarshon · 2019-07-23T17:34:03Z

@vincentqb @keunwoochoi could you take a look and let me know what you think?

keunwoochoi · 2019-07-23T17:56:04Z

Looks good to me. The things mentioned in the original thread and the comment seem aligned to what we had discussed :)
Curious though - what do you mean by no batching?

cpuhrsch · 2019-07-23T18:20:48Z

@keunwoochoi - see this comment for a wider discussion around this topic.

jamarshon · 2019-07-23T18:23:18Z

#131 (comment)
None of the current transforms/functional receive batched inputs/outputs. I think the goal is to have no batch support in this version and then add batching in the next version

vincentqb · 2019-07-23T18:49:01Z

I'm happy with the convention, and the code looks good to me. Thanks for updating the code and the tests.

vincentqb

#131 got merged and requires some cleaning

remove stft with batching until we decide to support batching everywhere
standardize naming to the convention agreed

vincentqb · 2019-07-24T15:58:48Z

If we are making the code leaner, should we only have DBScale/MelScale transforms without having the SpectrogramDB/MelSpectrogram if they are easy to chain?

This reverts commit a7aa440.

vincentqb · 2019-07-24T17:43:15Z

If we are making the code leaner, should we only have DBScale/MelScale transforms without having the SpectrogramDB/MelSpectrogram if they are easy to chain?

If we do this, we'll do it as part of a separate PR.

vincentqb

Looks good to me!

vincentqb · 2019-07-24T17:54:29Z

Based on the work done here, here are things we can consider for a later PR:

Remove SpectrogramToDB, and create a DBScale transformation like so.
~~Remove MelSpectrogram and keep only MelScale? (comment in favor of keeping)~~
Remove PadTrim.
~~Introduce Padding-only transformation, like so or so. [already in pytorch]~~
Rename MuLawExpanding to MuLawDecoding.

cpuhrsch · 2019-07-24T17:56:05Z

test/test_functional.py

-    'reflect',
-])
-@unittest.skipIf(not IMPORT_LIBROSA, 'Librosa is not available')
-def test_stft(waveform, fft_length, hop_length, pad_mode):


Why might want to bring this back / add it to core later on for the regular torch.stft

jamarshon · 2019-07-24T18:59:03Z

Based on the work done here, here are things we can consider for a later PR:

Remove SpectrogramToDB, and create a DBScale transformation like so.

Remove MelSpectrogram and keep only MelScale.

Remove PadTrim.

Introduce Padding-only transformation, like so.

Rename MuLawExpanding to MuLawDecoding.

Rename MuLawExpanding to MuLawDecoding. #159

keunwoochoi · 2019-07-24T19:42:31Z

awesome!!!

jamarshon added 6 commits July 18, 2019 06:57

more

2e6bc4a

more

8040752

more

5c0b693

more

23d2935

more

fce6637

more

99f449b

jamarshon changed the title ~~[WIP] [BC] Standardization of Transforms/Functionals~~ [BC] Standardization of Transforms/Functionals Jul 18, 2019

jamarshon requested review from vincentqb, cpuhrsch and zhangguanheng66 July 18, 2019 17:10

Merge branch 'master' into standardization

8bd893b

vincentqb mentioned this pull request Jul 18, 2019

Resample supports only one channel #153

Closed

vincentqb reviewed Jul 18, 2019

View reviewed changes

test/test_transforms.py Show resolved Hide resolved

cpuhrsch requested changes Jul 19, 2019

View reviewed changes

jamarshon added 4 commits July 22, 2019 11:43

small push to save progress

f00c46c

small push to save progress

e3085d3

fix test

e9c805f

Merge branch 'master' into standardization

a60fc69

more

d090ff6

remove trailing zero

fca025a

vincentqb mentioned this pull request Jul 23, 2019

[WIP] Audio preprocessing tutorial. pytorch/tutorials#572

Merged

3 tasks

jamarshon added 2 commits July 24, 2019 08:47

apply feedback: rearrange functions

840707d

apply feedback: rearrange functions

9da5089

vincentqb suggested changes Jul 24, 2019

View reviewed changes

Merge branch 'master' into standardization

a09bf60

jamarshon added 11 commits July 24, 2019 09:05

merge: delete stft

afe528a

merge

be082f8

remove batch support for istft

a7aa440

docstring

af1c8c8

docstring

44e1f4d

docstring

b383f00

docstring

fea5c06

more

a4f7d0f

more

3997d12

remove unused xfail

dc226c9

Revert "remove batch support for istft"

ab4ecb6

This reverts commit a7aa440.

vincentqb approved these changes Jul 24, 2019

View reviewed changes

rename batch to channel

99675a4

cpuhrsch reviewed Jul 24, 2019

View reviewed changes

cpuhrsch merged commit b29a463 into pytorch:master Jul 24, 2019

jamarshon deleted the standardization branch July 24, 2019 17:58

This was referenced Jul 24, 2019

Scale transform does not fail on non-integers #144

Closed

DownmixMono channels_first wrong default value #93

Closed

This was referenced Jul 24, 2019

Remove PadTrim #160

Merged

Adding Manifesto to README #169

Merged

vincentqb mentioned this pull request Aug 22, 2019

Consistency between torchvision/torchaudio #29

Closed

[BC] Standardization of Transforms/Functionals #152

[BC] Standardization of Transforms/Functionals #152

Uh oh!

Conversation

jamarshon commented Jul 18, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

tl;dr Signals will be (channels, time) (c, n). spectrograms will be (channels, freq, time) (c, f, t) and various transformations will be similar format with channel as first dimension and time as last dimension e.g (c, n_mels, t) or (c, n_mfcc, t)

Removal of Legacy Transforms

Channel x Time

Channel x Frequency x Time

DCT and Filterbank matrices

Misc

Uh oh!

cpuhrsch commented Jul 18, 2019

Uh oh!

ksanjeevan commented Jul 18, 2019

Uh oh!

keunwoochoi commented Jul 18, 2019

Uh oh!

Uh oh!

cpuhrsch left a comment

Choose a reason for hiding this comment

Uh oh!

jamarshon commented Jul 22, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jamarshon commented Jul 23, 2019

Uh oh!

keunwoochoi commented Jul 23, 2019

Uh oh!

cpuhrsch commented Jul 23, 2019

Uh oh!

jamarshon commented Jul 23, 2019

Uh oh!

vincentqb commented Jul 23, 2019

Uh oh!

vincentqb left a comment • edited by jamarshon Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vincentqb commented Jul 24, 2019

Uh oh!

vincentqb commented Jul 24, 2019

Uh oh!

vincentqb left a comment

Choose a reason for hiding this comment

Uh oh!

vincentqb commented Jul 24, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cpuhrsch Jul 24, 2019

Choose a reason for hiding this comment

Uh oh!

jamarshon commented Jul 24, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

keunwoochoi commented Jul 24, 2019

Uh oh!

Uh oh!

jamarshon commented Jul 18, 2019 •

edited

Loading

jamarshon commented Jul 22, 2019 •

edited

Loading

vincentqb left a comment •

edited by jamarshon

Loading

vincentqb commented Jul 24, 2019 •

edited

Loading

jamarshon commented Jul 24, 2019 •

edited

Loading