-
Notifications
You must be signed in to change notification settings - Fork 696
Add functionals gain, dither, scale_to_interval #319
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -858,6 +858,162 @@ def compute_deltas(specgram, win_length=5, mode="replicate"): | |
return output | ||
|
||
CamiWilliams marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
CamiWilliams marked this conversation as resolved.
Show resolved
Hide resolved
|
||
def gain(waveform, gain_db=1.0): | ||
# type: (Tensor, float) -> Tensor | ||
r"""Apply amplification or attenuation to the whole waveform. | ||
|
||
Args: | ||
waveform (torch.Tensor): Tensor of audio of dimension (channel, time). | ||
gain_db (float) Gain adjustment in decibels (dB) (Default: `1.0`). | ||
|
||
Returns: | ||
torch.Tensor: the whole waveform amplified by gain_db. | ||
""" | ||
if (gain_db == 0): | ||
return waveform | ||
|
||
ratio = 10 ** (gain_db / 20) | ||
CamiWilliams marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
return waveform * ratio | ||
|
||
|
||
def _scale_to_interval(waveform, interval_max=1.0): | ||
# type: (Tensor, float) -> Tensor | ||
r"""Scale the waveform to the interval [-interval_max, interval_max] across all dimensions. | ||
|
||
Args: | ||
waveform (torch.Tensor): Tensor of audio of dimension (channel, time). | ||
interval_max (float): The bounds of the interval, where the float indicates | ||
the upper bound and the negative of the float indicates the lower | ||
bound (Default: `1.0`). | ||
Example: interval=1.0 -> [-1.0, 1.0] | ||
|
||
Returns: | ||
torch.Tensor: the whole waveform scaled to interval. | ||
""" | ||
abs_max = torch.max(torch.abs(waveform)) | ||
CamiWilliams marked this conversation as resolved.
Show resolved
Hide resolved
|
||
ratio = abs_max / interval_max | ||
waveform /= ratio | ||
|
||
return waveform | ||
|
||
|
||
def _add_noise_shaping(dithered_waveform, waveform): | ||
r"""Noise shaping is calculated by error: | ||
error[n] = dithered[n] - original[n] | ||
noise_shaped_waveform[n] = dithered[n] + error[n-1] | ||
""" | ||
wf_shape = waveform.size() | ||
waveform = waveform.reshape(-1, wf_shape[-1]) | ||
|
||
dithered_shape = dithered_waveform.size() | ||
dithered_waveform = dithered_waveform.reshape(-1, dithered_shape[-1]) | ||
|
||
error = dithered_waveform - waveform | ||
|
||
# add error[n-1] to dithered_waveform[n], so offset the error by 1 index | ||
for index in range(error.size()[0]): | ||
CamiWilliams marked this conversation as resolved.
Show resolved
Hide resolved
|
||
err = error[index] | ||
error_offset = torch.cat((torch.zeros(1), err)) | ||
error[index] = error_offset[:waveform.size()[1]] | ||
CamiWilliams marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
noise_shaped = dithered_waveform + error | ||
return noise_shaped.reshape(dithered_shape[:-1] + noise_shaped.shape[-1:]) | ||
|
||
|
||
def _apply_probability_distribution(waveform, density_function="TPDF"): | ||
# type: (Tensor, str) -> Tensor | ||
r"""Apply a probability distribution function on a waveform. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This should just be an internal function, and it in fact does the core of applying dither. Let's rename this to nit: "Apply dither to the waveform using the chosen density function." There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I see this comment marked as resolved. What was the resolution of this? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Just pushed the change, I renamed it to _apply_probability_distribution. How does that sound? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. per this comment There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. And this comment... should I still rename to "apply dither"? |
||
|
||
Triangular probability density function (TPDF) dither noise has a | ||
triangular distribution; values in the center of the range have a higher | ||
probability of occurring. | ||
|
||
Rectangular probability density function (RPDF) dither noise has a | ||
uniform distribution; any value in the specified range has the same | ||
probability of occurring. | ||
|
||
Gaussian probability density function (GPDF) has a normal distribution. | ||
The relationship of probabilities of results follows a bell-shaped, | ||
or Gaussian curve, typical of dither generated by analog sources. | ||
Args: | ||
waveform (torch.Tensor): Tensor of audio of dimension (channel, time) | ||
probability_density_function (string): The density function of a | ||
continuous random variable (Default: `TPDF`) | ||
Options: Triangular Probability Density Function - `TPDF` | ||
Rectangular Probability Density Function - `RPDF` | ||
Gaussian Probability Density Function - `GPDF` | ||
Returns: | ||
torch.Tensor: waveform dithered with TPDF | ||
""" | ||
shape = waveform.size() | ||
waveform = waveform.reshape(-1, shape[-1]) | ||
|
||
channel_size = waveform.size()[0] - 1 | ||
time_size = waveform.size()[-1] - 1 | ||
|
||
random_channel = int(torch.randint(channel_size, [1, ]).item()) if channel_size > 0 else 0 | ||
random_time = int(torch.randint(time_size, [1, ]).item()) if time_size > 0 else 0 | ||
|
||
number_of_bits = 16 | ||
up_scaling = 2 ** (number_of_bits - 1) - 2 | ||
signal_scaled = waveform * up_scaling | ||
down_scaling = 2 ** (number_of_bits - 1) | ||
|
||
signal_scaled_dis = waveform | ||
if (density_function == "RPDF"): | ||
RPDF = waveform[random_channel][random_time] - 0.5 | ||
|
||
signal_scaled_dis = signal_scaled + RPDF | ||
elif (density_function == "GPDF"): | ||
# TODO Replace by distribution code once | ||
# https://github.com/pytorch/pytorch/issues/29843 is resolved | ||
# gaussian = torch.distributions.normal.Normal(torch.mean(waveform, -1), 1).sample() | ||
|
||
num_rand_variables = 6 | ||
|
||
gaussian = waveform[random_channel][random_time] | ||
for ws in num_rand_variables * [time_size]: | ||
rand_chan = int(torch.randint(channel_size, [1, ]).item()) | ||
gaussian += waveform[rand_chan][int(torch.randint(ws, [1, ]).item())] | ||
|
||
signal_scaled_dis = signal_scaled + gaussian | ||
else: | ||
TPDF = torch.bartlett_window(time_size + 1) | ||
TPDF = TPDF.repeat((channel_size + 1), 1) | ||
signal_scaled_dis = signal_scaled + TPDF | ||
|
||
quantised_signal_scaled = torch.round(signal_scaled_dis) | ||
quantised_signal = quantised_signal_scaled / down_scaling | ||
return quantised_signal.reshape(shape[:-1] + quantised_signal.shape[-1:]) | ||
|
||
|
||
def dither(waveform, density_function="TPDF", noise_shaping=False): | ||
# type: (Tensor, str, bool) -> Tensor | ||
r"""Dither increases the perceived dynamic range of audio stored at a | ||
particular bit-depth by eliminating nonlinear truncation distortion | ||
(i.e. adding minimally perceived noise to mask distortion caused by quantization). | ||
Args: | ||
waveform (torch.Tensor): Tensor of audio of dimension (channel, time) | ||
density_function (string): The density function of a | ||
continuous random variable (Default: `TPDF`) | ||
Options: Triangular Probability Density Function - `TPDF` | ||
Rectangular Probability Density Function - `RPDF` | ||
Gaussian Probability Density Function - `GPDF` | ||
noise_shaping (boolean): a filtering process that shapes the spectral | ||
energy of quantisation error (Default: `False`) | ||
|
||
Returns: | ||
torch.Tensor: waveform dithered | ||
""" | ||
dithered = _apply_probability_distribution(waveform, density_function=density_function) | ||
|
||
if noise_shaping: | ||
return _add_noise_shaping(dithered, waveform) | ||
else: | ||
return dithered | ||
|
||
|
||
def _compute_nccf(waveform, sample_rate, frame_time, freq_low): | ||
# type: (Tensor, int, float, int) -> Tensor | ||
r""" | ||
|
Uh oh!
There was an error while loading. Please reload this page.