🔉 New SAM-Audio model for audio segmentation – great potential for data augmentation #866
LimitlessGreen
started this conversation in
Ideas
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
🔉 New SAM-Audio model for audio segmentation – potential for data augmentation
Hi everyone,
I wanted to share a recent finding that might be interesting and potentially useful for data augmentation in bioacoustic workflows.
Meta’s Segment Anything family has a new member: SAM-Audio — an audio segmentation model that enables prompt-based separation of sound events from mixed audio recordings.
You can try it here:
🔗 https://aidemos.meta.com/segment-anything/editor/segment-audio/?media_id=1343694437028496
For my test, I used
"bird chorus"as the separation prompt on a mixed recording.🎵 Example results
combined_audio.mp4
isolated_sound.mp4
without_isolated_sound.mp4
As you can hear, the bird-only result is surprisingly clean, especially given that this is a prompt-based, model-agnostic separation approach.
🧠 Why this might be relevant for BirdNET
This approach could be useful as a semi-automated way to generate training data, for example:
Both could then be recombined in a controlled way to generate augmented samples with:
This is not intended as a replacement for curated field recordings, but potentially as:
💡 Motivation
I’m mainly sharing this as an informational note and possible inspiration for future workflows around BirdNET or related tooling.
Best regards
Beta Was this translation helpful? Give feedback.
All reactions