-
Notifications
You must be signed in to change notification settings - Fork 7
feat: add STFT->MEL conversion #3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
lib/nx_signal.ex
Outdated
stride when Elixir.Kernel.is_list(stride) -> | ||
stride | ||
|
||
stride | ||
when Elixir.Kernel.and( | ||
Elixir.Kernel.is_integer(stride), | ||
Elixir.Kernel.>=(stride, 1) | ||
) -> | ||
[stride] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are you using Elixir.Kernel
here? :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably this was a transform
before with inline fn
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need for this now
lib/nx_signal.ex
Outdated
dilations = List.duplicate(1, Nx.rank(shape)) | ||
|
||
{pooled_shape, padding_config} = | ||
Nx.Shape.pool(shape, window_dimensions, strides, padding, dilations) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Private API :(
lib/nx_signal.ex
Outdated
(log_spec + 4) / 4 | ||
end | ||
|
||
defn linspace(min, max, opts \\ []) do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Public or private?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
linspace is also used a lot in Bumblebee, it may be worth considering for Nx upstream as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for that. It's a really common function, albeit a short one
lib/nx_signal.ex
Outdated
] | ||
> | ||
""" | ||
defn mel_filters(opts \\ []) do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should make this function a deftransform
and all numeric values that are given as options should be passed as regular arguments to the underlying defnp mel_filters_impl
:)
lib/nx_signal.ex
Outdated
[0.0, 1.6e3, 3.2e3, 4.8e3, 6.4e3, 8.0e3, 9.6e3, 1.12e4, 1.28e4, 1.44e4] | ||
> | ||
""" | ||
defn fft_frequencies(opts \\ []) do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should make fs
a regular argument here. :)
lib/nx_signal.ex
Outdated
[0.0, 200.0] | ||
> | ||
""" | ||
defn stft(data, window, opts \\ []) do | ||
{frame_size} = Nx.shape(window) | ||
|
||
{overlap_size, fs} = | ||
{overlap_size, fs, padding} = | ||
transform({frame_size, opts}, fn {frame_size, opts} -> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider implement stft
as deftransform
and passing fs
as an argument to defn
, so it is properly converted to a tensor!
@@ -224,6 +301,107 @@ defmodule NxSignal do | |||
output | |||
end | |||
|
|||
defnp as_windowed_reflect_padding(tensor, opts \\ []) do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reflect padding might be a good addition upstream to Nx
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main difficulty with that is that currently padding will create values that are repeated (through {left, right, inner} padding config tuples).
I can't see how we'd frame this in terms of that API
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean the API that talks to backends, not the public-facing one
* `:nfft` - Number of FFT bins | ||
* `:nmels` - Number of target MEL bins. Defaults to 128 | ||
* `:type` - Target output type. Defaults to `{:f, 32}` | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about more explicit names, like:
fs
->sampling_frequency
/sampling_rate
nfft
->num_fft_bins
nmels
->num_mel_bins
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fs and nfft specifically are usual names for these params, that's why I chose them, but nmels not so much
https://docs.scipy.org/doc/scipy/reference/generated/scipy.signal.stft.html
https://pytorch.org/docs/stable/generated/torch.stft.html
https://www.mathworks.com/help/signal/ref/stft.html
WDYT about:
nfft -> fft_length
fs -> sampling_frequency
nmels -> mel_bins
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nfft/fs would need to change for more functions I think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FFT is itself an acronym, so adding n on top of that may easily confuse someone without much prior experience, so a more explicit name is generally helpful.
The names above sound good to me!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, I'll change these!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for consistency, I'm also changing overlap_size -> overlap_length given that both it and nfft operate on the same headspace of "samples"
0f4c653
to
fcf61d1
Compare
Because this PR has already taken too many extraneous changes, I'll leave the refactor for the reflection padding to use Nx.reflect for a follow-up PR |
No description provided.