Skip to content

Merging plan from torchaudio-contrib #110

Open
@keunwoochoi

Description

@keunwoochoi

Hi all,
I think it's good timing to discuss a potential merging plan from torchaudio-contrib to here, especially because there's going to be new features and changes by @jamarshon @cpuhrsch.

Main idea

A lot of things are well summarized in https://github.com/keunwoochoi/torchaudio-contrib. In short, we wanted to re-design torch-based audio processing so that

  • things can be Layers, which are based on corresponding Functionals
  • names for layers and arguments are carefully chosen
  • all work for multi-channel
  • complex numbers are supported when it makes sense (e.g., STFTs)

Review - layers

. torchaudio-contrib already covers lots of functions that transform.py is covering now, but not all of them. And that's why I feel like it's time to discuss this here.
Let me list the classes in transform.py one by one with some notes.

1. Already in torchaudio-contrib. Hoping we'd replace.

  • class Spectrogram: we have it in torchaudio-contrib. On top of this, we also have STFT layer which outputs complex representations (same as torch.stft since we're wrapping it).
  • class MelScale: we have it and would like to suggest to change the name to something more general. We named it class MelFilterbank, assuming there can be other types of filterbanks, too. It also supports htk and non-htk mel filterbanks.
  • class SpectrogramToDB: we would like to propose a more general approach -- class AmplitudeToDb(ref=1.0, amin=1e-7) and class DbToAmplitude(ref=1.0), because decibel-scaling is about changing it's unit, not the core content of the input.
  • class MelSpectrogram: we have it, which returns a nn.Sequential model consists of Spectrogram and mel-scale filter bank.
  • class MuLawEncoding, class MuLawExpanding: we have it, actually a 99% copy of the implementation here.

2. Wouldn't need these

  • class Compose: we wouldn't need it because once things are based on Layers people can simply build a nn.Sequential().
  • class Scale: It does 16int --> float. I think we need to deprecate this because if we really need this, it should be with a more intuitive and precise name, and probably should support other conversions as well.

3. To-be-added

  • class DownmixMono: I would like to have one. But we also consider having a time-frequency representation-based downmix (energy-preserving operation) (@faroit). I'm open for discussion. Personally I'd prefer to have separate classes,DownmixWaveform() and DownmixSpecgram(). Maybe until we have a better one, we should keep it as it is.
  • class MFCC: we currently don't have it. The current torch/audio implementation uses s2db (SpectrogramToDB), but this class seems little arbitrary for me, so we might want to re-implement it.

4. Not sure about these

  • class PadTrim: I don't actually know why we need it exactly, would love to hear about this!
  • class LC2CL: So far, torchaudio-contrib code hasn't considered channel-first tensors. If it's a thing, we'd i) update our code to make them compatible and ii) have the same or a similar class to this. But, ..do we really need this?
  • class BLC2CBL: same as LC2CL -- I'd like to know its use cases.

Review - argument and variable names

As summarised --> keunwoochoi/torchaudio-contrib#46, we'd like to use

  • waveforms for a batch of waveforms
  • real_specgrams for magnitude spectrograms
  • complex_specgrams for complex spectrograms
    . (This is relatively less-discussed).

Audio loading

@faroit has been working on replacing Sox with others. But here in this issue, I'd like to focus on the topics above.

So,

  • Any opinion on this?
  • Any answers to the questions I have!
  • If it looks good, what else would you like to have in the one-shot PR that would replace the current transforms.py?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions