Skip to content

feat: add STFT->MEL conversion #3

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 20 commits into from
Jan 24, 2023
Merged

feat: add STFT->MEL conversion #3

merged 20 commits into from
Jan 24, 2023

Conversation

polvalente
Copy link
Collaborator

No description provided.

@polvalente polvalente self-assigned this Jan 15, 2023
@polvalente polvalente marked this pull request as ready for review January 15, 2023 05:55
lib/nx_signal.ex Outdated
Comment on lines 247 to 255
stride when Elixir.Kernel.is_list(stride) ->
stride

stride
when Elixir.Kernel.and(
Elixir.Kernel.is_integer(stride),
Elixir.Kernel.>=(stride, 1)
) ->
[stride]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you using Elixir.Kernel here? :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably this was a transform before with inline fn

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for this now

lib/nx_signal.ex Outdated
dilations = List.duplicate(1, Nx.rank(shape))

{pooled_shape, padding_config} =
Nx.Shape.pool(shape, window_dimensions, strides, padding, dilations)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Private API :(

lib/nx_signal.ex Outdated
(log_spec + 4) / 4
end

defn linspace(min, max, opts \\ []) do
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Public or private?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

linspace is also used a lot in Bumblebee, it may be worth considering for Nx upstream as well

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for that. It's a really common function, albeit a short one

lib/nx_signal.ex Outdated
]
>
"""
defn mel_filters(opts \\ []) do
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should make this function a deftransform and all numeric values that are given as options should be passed as regular arguments to the underlying defnp mel_filters_impl :)

lib/nx_signal.ex Outdated
[0.0, 1.6e3, 3.2e3, 4.8e3, 6.4e3, 8.0e3, 9.6e3, 1.12e4, 1.28e4, 1.44e4]
>
"""
defn fft_frequencies(opts \\ []) do
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should make fs a regular argument here. :)

lib/nx_signal.ex Outdated
[0.0, 200.0]
>
"""
defn stft(data, window, opts \\ []) do
{frame_size} = Nx.shape(window)

{overlap_size, fs} =
{overlap_size, fs, padding} =
transform({frame_size, opts}, fn {frame_size, opts} ->
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider implement stft as deftransform and passing fs as an argument to defn, so it is properly converted to a tensor!

@polvalente polvalente requested a review from josevalim January 15, 2023 14:33
@@ -224,6 +301,107 @@ defmodule NxSignal do
output
end

defnp as_windowed_reflect_padding(tensor, opts \\ []) do

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reflect padding might be a good addition upstream to Nx

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main difficulty with that is that currently padding will create values that are repeated (through {left, right, inner} padding config tuples).

I can't see how we'd frame this in terms of that API

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean the API that talks to backends, not the public-facing one

* `:nfft` - Number of FFT bins
* `:nmels` - Number of target MEL bins. Defaults to 128
* `:type` - Target output type. Defaults to `{:f, 32}`

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about more explicit names, like:

  • fs -> sampling_frequency/sampling_rate
  • nfft -> num_fft_bins
  • nmels -> num_mel_bins

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fs and nfft specifically are usual names for these params, that's why I chose them, but nmels not so much

https://docs.scipy.org/doc/scipy/reference/generated/scipy.signal.stft.html
https://pytorch.org/docs/stable/generated/torch.stft.html
https://www.mathworks.com/help/signal/ref/stft.html

WDYT about:
nfft -> fft_length
fs -> sampling_frequency
nmels -> mel_bins

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nfft/fs would need to change for more functions I think

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FFT is itself an acronym, so adding n on top of that may easily confuse someone without much prior experience, so a more explicit name is generally helpful.

The names above sound good to me!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, I'll change these!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for consistency, I'm also changing overlap_size -> overlap_length given that both it and nfft operate on the same headspace of "samples"

@polvalente polvalente requested review from jonatanklosko and removed request for josevalim January 23, 2023 21:54
@polvalente
Copy link
Collaborator Author

Because this PR has already taken too many extraneous changes, I'll leave the refactor for the reflection padding to use Nx.reflect for a follow-up PR

@polvalente polvalente merged commit d6f0b87 into main Jan 24, 2023
@polvalente polvalente deleted the pv-feat/stft-to-mel branch January 24, 2023 00:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants