-
Notifications
You must be signed in to change notification settings - Fork 88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Process (various kinds of) target data similarly to input data #3
Comments
Nice! This is definitely THE feature that would make me switch to |
We should make examples of what the API should look like in each scenario |
Yeah, I agree that this would be super cool, but how would it look like? @hbredin do you have API ideas in mind? |
The last one, "Audio (length same as the input)" is supported already by doing this:
We can refine the API for this scenario later |
I am currently working on an augmentation technique that basically sums two samples selected randomly from within a batch, for training end-to-end speaker diarization models. This falls under the "Time-serie of class(es)" scenario. What I am missing is a way to keep track of targets (in my case, frame-wise speaker activations). The API would look like this: augment = SumTwoSamples(min_snr_in_db=0.0, max_snr_in_db=5.0)
augmented_samples, augmented_targets = augment(samples, sample_rate, targets=targets, target_rate=target_rate) where
I am trying to somehow bend In particular, it is not clear to me what What are your thoughts? cc @FrenchKrab |
Interesting! I get a feeling that it could blow up in complexity if we want to cover all use cases in one class. Maybe it makes sense to make separate "variants" of each transform? Or should we try to handle the various cases in one class? I had a quick look at your draft PR, and my first thought was that it introduces breaking changes in all transforms (they return a tuple instead of a tensor), which then again led me to check out how computer vision data augmentation libs do it. For example, albumentations has the power to process a target (image/mask, bounding boxes or set of points) in the same way as the input image. In albumentations each transform returns a dictionary, which leaves some flexibility in what is included in the returned value. Maybe that's a useful pattern to adopt here?
Could you elaborate on what feels unclear? Anyway, I think it would make sense to first design the API, then implement it for a simple transform, like Shift or TimeInversion. Also, if we're going to make backwards-incompatible (breaking) changes, it would be nice to release a version with future warnings first, even though we're still in alpha 🤓 |
We should probably start by listing the type of targets used in audio.
Actually, I made sure not to break the user facing API -- only the internal methods It only returns a tuple when the user passes the additional
It would also mean breaking the API right? I really like the current simple
I was refering to the original transform that made me think about this PR:
As stated a few lines above, PR #123 does not make breaking changes, as in: "code that relies on the documented user facing API will still work". Custom augmentations would need to be updated though -- unless we add a PR #123 does implement |
Agreed. So it comes down to the shape of the time series data then. I don't think I have any better suggestion than the shape you suggested in your PR.
Good! I haven't looked at it in detail yet ^^ I might have time for that on friday. Thanks for your patience :) |
So the output type is either tensor or tuple, based on the parameters given? I'm not sure if this is a good pattern. Do you know of other python libraries that do it like that? Usually functions give the same output type (or sometimes None) regardless of the inputs. |
It seems imgaug uses this pattern (returns a tuple of a target is provided):
albumentations on the other hand, always returns a dict |
Maybe we should post a vote on those two options in Slack? |
Option 1: Always return a dictInspired by albumentations - 9.9k stars, actively maintained
Option 2: Return tensor or tuple depending on inputInspired by imgaug - 12.4k stars, abandoned project
|
Voted! |
I posted it in asteroid and thesoundofai. At the moment all 5 votes are in favor of dict. @mpariente suggested that we add support for attribute access If I should conclude early, I think it's best to return an ObjectDict instead of a tuple or a tensor. ObjectDict behaves like a dict, but also supports access by attribute, so the two following lines would give the same result:
Adding a PR for that here |
BREAKING: add support for targets as discussed in #3
The idea is that whatever perturbations are applied to the input are reflected in the target data
Scenarios:
The text was updated successfully, but these errors were encountered: