BREAKING: add support for targets as discussed in #3 #123

hbredin · 2022-03-22T20:35:42Z

This is an initial attempt at addressing #3.
The proposed API assumes the following shapes:

samples: (batch_size, num_channels, num_samples)
targets: (batch_size, num_channels, num_frames, num_classes)

Input	Target	`num_frames`	`num_classes`
Audio	class(es), fixed length, not correlated with the input length. Low priority.	1	1 or more
Audio	Time series of class(es).	more than 1	1 or more
Audio	Audio (length same as the input).	`num_samples`	1

and is used like that

augment = MixSamples()
augmented = augment(samples, targets=targets)
mixed_samples = augmented.samples
mixed_targets = augmented.targets

hbredin · 2022-03-24T08:26:47Z

I have just added a Mix transform as an example.

iver56 · 2022-03-28T11:33:38Z

A good next step would be to have it return an ObjectDict (see #126) instead of tensor or tuple

hbredin · 2022-03-28T12:42:36Z

Will do.

hbredin · 2022-03-29T07:06:11Z

What do you think of automatically inferring target_rate from num_samples = samples.shape[2], sample_rate, and num_frames == target.shape[2]?

target_rate = sample_rate * num_frames / num_samples

with the num_frames = 1 corner case where target_rate does not even exist (one label for the whole sample).

iver56 · 2022-03-29T12:12:52Z

I would prefer to have it explicitly provided by the user. Here's an example that explains why:

Yamnet has a rate of 2,0833 hz (1 classification output for every step of 480 ms). In this example the rate cannot be accurately inferred, especially if the audio is short, because e.g. 1,1 seconds and 1,3 seconds of audio will give the same number of frames.

hbredin · 2022-03-29T12:17:43Z

I would prefer to have it explicitly provided by the user. Here's an example that explains why:

Yamnet has a rate of 2,0833 hz (1 classification output for every step of 480 ms). In this example the rate cannot be accurately inferred, especially if the audio is short, because e.g. 1,1 seconds and 1,3 seconds of audio will give the same number of frames.

Noted.

When target_rate is required but missing, can we still do our best to infer it automatically and warn the user about it?

warnings.warn(
    f"target_rate is required by {self.__class__.__name__}. "
    f"It has been automatically inferred from targets shape to {inferred_target_rate} "
    f"If this is incorrect, please use target_rate to pass it directly.")

iver56 · 2022-03-29T12:22:17Z

Yes, I'm okay with that

hbredin · 2022-03-29T12:31:34Z

Updated with the API discussed in Slack.

iver56 · 2022-03-30T08:18:33Z

Thanks :) I'll have a look soon-ish. Thanks for your patience.

iver56 · 2022-03-31T12:29:53Z

This is starting to look quite good already!

Here's what I think needs to be done:

Before it gets merged:

Mix deserves a unit test or two. Can you make that happen @hbredin?

After it gets merged:

Add an output_format parameter that defaults to the old-style API, so we can release at least one version with backwards-compatible API (and then have the new API style be opt-in)
Have a demo script that processes targets as well as the usual inputs. Maybe also document some examples of how to use the targets feature.
Update changelog

hbredin · 2022-03-31T13:59:27Z

Before it gets merged:

* Mix deserves a unit test or two. Can you make that happen @hbredin?

I definitely can (and will) but I am having trouble thinking about what to test.

I can do the equivalent of AddBackgroundNoise's test_varying_snr_within_batch.
Do you think of a test focusing on targets?

iver56 · 2022-04-01T06:53:22Z

I definitely can (and will) but I am having trouble thinking about what to test.

I can do the equivalent of AddBackgroundNoise's test_varying_snr_within_batch. Do you think of a test focusing on targets?

At the very least I like to have the shape and the dtype of the output tested. And that the output isn't the same as the output. But of course, smart test assertions are useful 👍

iver56 · 2022-04-01T06:54:49Z

Thanks for the contribution 🚀

hbredin · 2022-04-01T09:48:17Z

Thanks for merging this!

After it gets merged:

* Add an output_format parameter that defaults to the old-style API, so we can release at least one version with backwards-compatible API (and then have the new API style be opt-in)

I can take care of it but can you please clarify the API?

* Have a demo script that processes targets as well as the usual inputs. Maybe also document some examples of how to use the targets feature

I can easily create a notebook showing how I use this new feature in pyannote.audio.
Would that be OK or do you want this demo to be 100% torch_audiomentation?

* Update changelog

I suggest you do that :)

iver56 · 2022-04-01T13:12:19Z

I can take care of it but can you please clarify the API?

Great! Hmm, how about something like this, for example:

transform = PolarityInversion()
transform(my_audio)  # outputs a tensor, as before. Also emit a FutureWarning with info on the new output_type arg and a recommended on transitioning to object_dict

transform = PolarityInversion(output_type="tensor")
transform(my_audio)  # outputs a tensor, as before. Maybe also emit a deprecation warning?

transform = PolarityInversion(output_type="object_dict")
transform(my_audio)  # outputs an ObjectDict

I can easily create a notebook showing how I use this new feature in pyannote.audio.
Would that be OK or do you want this demo to be 100% torch_audiomentation?

Sounds good to me :)
If the notebook is large, consider using git LFS

wip: add support for targets as discussed in asteroid-team#3

66fdbb5

hbredin requested review from iver56 and mpariente March 22, 2022 20:35

hbredin mentioned this pull request Mar 23, 2022

Process (various kinds of) target data similarly to input data #3

Open

hbredin added 3 commits March 24, 2022 08:57

fix: fix missing targets handling

84ea872

feat: add support for batches in Audio.rms_normalize

e4f6872

wip: add Mix transform

2ff22c8

hbredin added 5 commits March 24, 2022 09:39

fix: honor "mix_target" option

94a6736

fix: fix typo

efe0806

fix: add Mix to the mix

7b7979a

fix: fix composition support for target

b60b07d

fix: fix two corner cases

d058773

Add ObjectDict class for results

30e723b

BREAKING: switch to ObjectDict output

4ec0aaf

hbredin marked this pull request as ready for review March 29, 2022 12:35

hbredin requested review from popcornell and pzelasko March 29, 2022 12:35

hbredin changed the title ~~wip: add support for targets as discussed in #3~~ BREAKING: add support for targets as discussed in #3 Mar 29, 2022

hbredin mentioned this pull request Mar 30, 2022

Add ObjectDict class #126

Closed

Remove try-except for ObjectDict compatibility with old python versions

21e54e8

iver56 mentioned this pull request Mar 31, 2022

Add RandomCrop transform #127

Merged

hbredin added 3 commits March 31, 2022 16:37

feat: add test_varying_snr_within_batch test for Mix augmentation

fc37f04

fix: remove useless imports

932e562

feat: add targets-related tests in Mix

d252497

iver56 approved these changes Apr 1, 2022

View reviewed changes

iver56 merged commit 7bc37e5 into asteroid-team:master Apr 1, 2022

hbredin mentioned this pull request Apr 4, 2022

feat: add "output_type" argument #129

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BREAKING: add support for targets as discussed in #3 #123

BREAKING: add support for targets as discussed in #3 #123

hbredin commented Mar 22, 2022 •

edited

Loading

hbredin commented Mar 24, 2022

iver56 commented Mar 28, 2022

hbredin commented Mar 28, 2022 •

edited

Loading

hbredin commented Mar 29, 2022

iver56 commented Mar 29, 2022

hbredin commented Mar 29, 2022 •

edited

Loading

iver56 commented Mar 29, 2022

hbredin commented Mar 29, 2022

iver56 commented Mar 30, 2022 •

edited

Loading

iver56 commented Mar 31, 2022

hbredin commented Mar 31, 2022

iver56 commented Apr 1, 2022

iver56 commented Apr 1, 2022

hbredin commented Apr 1, 2022

iver56 commented Apr 1, 2022 •

edited

Loading

BREAKING: add support for targets as discussed in #3 #123

BREAKING: add support for targets as discussed in #3 #123

Conversation

hbredin commented Mar 22, 2022 • edited Loading

hbredin commented Mar 24, 2022

iver56 commented Mar 28, 2022

hbredin commented Mar 28, 2022 • edited Loading

hbredin commented Mar 29, 2022

iver56 commented Mar 29, 2022

hbredin commented Mar 29, 2022 • edited Loading

iver56 commented Mar 29, 2022

hbredin commented Mar 29, 2022

iver56 commented Mar 30, 2022 • edited Loading

iver56 commented Mar 31, 2022

hbredin commented Mar 31, 2022

iver56 commented Apr 1, 2022

iver56 commented Apr 1, 2022

hbredin commented Apr 1, 2022

iver56 commented Apr 1, 2022 • edited Loading

hbredin commented Mar 22, 2022 •

edited

Loading

hbredin commented Mar 28, 2022 •

edited

Loading

hbredin commented Mar 29, 2022 •

edited

Loading

iver56 commented Mar 30, 2022 •

edited

Loading

iver56 commented Apr 1, 2022 •

edited

Loading