Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mezza Giampiccolo Bernardini Sarti #5

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

ilic-mezza
Copy link

Alessandro Ilic Mezza, Riccardo Giampiccolo, Alberto Bernardini, Augusto Sarti

Dear SDX workshop committee,

I request a review for the following SDX submission (Abstract: 250 Words)

As a participant of the SDX challenge, please know that your talk would be automatically accepted after a minimal prescreening. Hence, don't forget to provide us with the link to your team in the AIcrowd website.
You are also welcome to submit us some unrelated abstract, in case your interest would be in presenting some method or idea you would just like to discuss with the participants passing by.

Title: StemGMD: A Large-Scale Multi-Kit Audio Dataset for Deep Drums Demixing

Author(s): Alessandro Ilic Mezza, Riccardo Giampiccolo, Alberto Bernardini, and Augusto Sarti

Challenge submission (weather this submission is related to an sdx challenge entry):

  • MDX Leaderboard A
  • MDX Leaderboard B
  • MDX Leaderboard C
  • CDX Leaderboard A
  • CDX Leaderboard B

Workshop participance

  • Virtual
  • On-Site

ORGANISATION COMMITTEE (do not fill out)

  • Editor acknowledgment
  • Reviewer 1
  • Reviewer 2
  • Review 1 decision [accept/reject]
  • Review 2 decision [accept/reject]
  • Editor decision [accept/reject]

@faroit
Copy link
Contributor

faroit commented Oct 9, 2023

👋 @ilic-mezza thanks for your submission. We are going to start reviewing end of this week!

@StefanUhlich-sony
Copy link
Collaborator

StefanUhlich-sony commented Oct 18, 2023

Hello @ilic-mezza, thanks a lot for your submission, which looks interesting. For the abstract, it would be great if you could add the link to the StemGMD dataset and a link to your reference U-Net. Furthermore, could you please specify the nine stems that you consider and how you group them to five stems for the U-Nets that you trained?

Besides this, I have the following questions (not required for the abstract - I am just curious 😊):

  • Besides using sound fonts, there would also be the possibility to use a drum sample library to create such a dataset (e.g., https://www.toontrack.com/product/ezdrummer-3/) - did you compare the quality of your sound fonts to such a library?
  • Do the sound fonts already include reverberation? Did you add reverb yourself?
  • Did you add any compressor effect? How do you avoid clipping in the mix of the five/nine sources?
  • How did you train the stereo networks? Did you assume a specific drumset layout and then used a panning technique or is this stereo information already contained in your sound fonts?

@ilic-mezza
Copy link
Author

Hello @TE-StefanUhlich, thanks for reviewing our submission.

In the updated paper.md file, we specified the nine stems in StemGMD (kick drum, snare, high tom, mid-low tom, floor tom, open hi-hat, closed hi-hat, ride cymbal, crash cymbal) and the five stems separated by the U-Nets (kick drum, snare, tom-toms, hi-hat, cymbals).

StemGMD happens to be very big (more than 1 TB). We are currently working with Zenodo to accommodate the hosting of such a large dataset. The link will be made available soon, and we will update the abstract accordingly.

The U-Net code is available here: https://github.com/polimi-ispl/larsnet

Besides using sound fonts, there would also be the possibility to use a drum sample library to create such a dataset (e.g., https://www.toontrack.com/product/ezdrummer-3/) - did you compare the quality of your sound fonts to such a library?

Thank you for pointing out the difference between "sound fonts" and "drum sample libraries." It appears that we used the term "sound font" inappropriately. In fact, StemGMD was created using drum sample libraries from Drum Kit Designer shipped with Logic Pro X, whose quality is (arguably) on par with that of EZDrummer. The abstract was updated to clarify this aspect.

Do the sound fonts already include reverberation? Did you add reverb yourself?

Yes, all tracks are sent to a bus where room reverberation is applied using sampled IRs. The IR varies across different drum kits.

Did you add any compressor effect? How do you avoid clipping in the mix of the five/nine sources?

Compression was applied as a data augmentation strategy in training the neural networks but was not used in creating StemGMD. When exporting StemGMD's audio clips, the level of each track was kept at -3 dB to avoid clipping the master channel.

Anyhow, we cannot exclude that compression (as well as other FXs) was applied to the drum sounds by the library's creator.

How did you train the stereo networks? Did you assume a specific drumset layout and then used a panning technique or is this stereo information already contained in your sound fonts?

The stereo information is already contained in the drum samples. However, we apply various data augmentation methods at training time, including doubling and L/R channel swap to mitigate the limitations of the fixed drumset layout that came with Drum Kit Designer.

Copy link
Collaborator

@StefanUhlich-sony StefanUhlich-sony left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ilic-mezza Thanks a lot for your answer - everything looks fine.

Copy link
Contributor

@faroit faroit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @ilic-mezza for the final version of the abstract, can you please add the dataset url as a reference of footnote?

@ilic-mezza
Copy link
Author

Hi @faroit

Sure, we will add the link to the final version of the abstract. As for now, we uploaded the dataset on Zenodo and we were assigned a DOI (10.5281/zenodo.7860223). However, the repository is still private as it is being prepared, and it does not have a URL yet.

Our plan is to make the dataset public by the end of the week once we're sure everything is in order. Then, we will proceed to update the abstract accordingly—would that be ok?

paper.md Outdated
bank of parallel U-Nets that separates five stems (kick drum, snare, tom-toms, hi-hat, cymbals) from a stereo drum mixture through spectro-temporal soft masking.
Such model is meant to serve as a baseline for future research and might complement existing music demixing models.

[^1] DOI: 10.5281/zenodo.7860223
Copy link
Contributor

@faroit faroit Oct 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that the footnotes doesn't render nicely here.

Maybe you create a real reference and cite it here. Also it would be nice if the DOI is valid hyperlink that would directly link to zenodo

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I will submit a new version with a real reference and a working hyperlink to zenodo. We are re-uploading the files on zenodo as I write, as we noticed a problem with the zip files we had already uploaded. Our hope is to be done by the end of the day; as soon as the dataset is online, I'll proceed and commit the new abstract.

@faroit
Copy link
Contributor

faroit commented Oct 27, 2023

Hi @ilic-mezza , as the workshop is coming up very soon, we are finalizing the programm. Stay tuned for the definite time-table for your presentations. In the meantime, we want to clarify the recording and broadcasting rights of the presentations, so please acknowledge the following:

I hereby authorize the right and permission to copyright and/or publish, reproduce or otherwise use my name, voice, and audio-visual recordings. I acknowledge and understand these materials about or of me may be used for both commercial and/or non-commercial purposes.

I understand that my image may be edited, copied, exhibited, published and/or distributed. There is no time limit on the validity of this release nor are there any geographic limitations on where these materials may be distributed.

I authorize that the video will be made available under the CC BY-SA 4.0 license on the conference website.

and comment on this by replying with acknowledge.

@ilic-mezza
Copy link
Author

Hi @ilic-mezza , as the workshop is coming up very soon, we are finalizing the programm. Stay tuned for the definite time-table for your presentations. In the meantime, we want to clarify the recording and broadcasting rights of the presentations, so please acknowledge the following:

I hereby authorize the right and permission to copyright and/or publish, reproduce or otherwise use my name, voice, and audio-visual recordings. I acknowledge and understand these materials about or of me may be used for both commercial and/or non-commercial purposes.

I understand that my image may be edited, copied, exhibited, published and/or distributed. There is no time limit on the validity of this release nor are there any geographic limitations on where these materials may be distributed.

I authorize that the video will be made available under the CC BY-SA 4.0 license on the conference website.

and comment on this by replying with acknowledge.

acknowledge

@ilic-mezza
Copy link
Author

ilic-mezza commented Oct 27, 2023

Hi @faroit,

The dataset is finally up! We updated the abstract adding a proper reference with a Zenodo url and doi (10.5281/zenodo.7860223). Let us know if the BibTeX entry types renders well. In particular, we might have to change "@dataset" to "@misc".

Best,
Alessandro

@faroit
Copy link
Contributor

faroit commented Oct 27, 2023

@ilic-mezza just checked, looks good! Thanks

@faroit
Copy link
Contributor

faroit commented Oct 31, 2023

Hi @ilic-mezza,

the workshop is approaching soon, here are few info regarding your presentation:

  • The program is up and you can find your presentation slot here: https://sdx-workshop.github.io/program
  • If your presentation is virtual, please show up in the zoom webinar at least 10 minutes before your slot and use a distinct user-name so that we can identify you during the presentation
  • If your presentation is on site in Milano, please send us your pdf/pptx/web in advance and let us know if you need audio output. You can also present from your own device, but then test the setup before the workshop starts or during the coffee break. We don't know much about the projection setup right now, so bring adapters for standard HDMI/VGA in that case.
  • You have 15 minutes in total for your presentation including questions, so we suggest to prepare slides for about 12-13 minutes.
  • Please let us know if there is any problem with your presentation by tagging us here in the respective PR

See you soon!

@ilic-mezza
Copy link
Author

Hi,

Thank you for accepting our submission. Looking forward to the workshop!

As you recently published the program, I noticed that the references in our abstract don't render as I'd expected. I will commit a new version of the bib file in case you'd like to update the pdf. However, it's nothing major so no worries if it turns out to be too much of an hassle. Also, I will take the opportunity to fix a small typo in the paper.md file.

Take care!

@faroit
Copy link
Contributor

faroit commented Oct 31, 2023

@ilic-mezza Sure, will update the pdf tomorrow

@ilic-mezza
Copy link
Author

Hi @faroit, do you think you'll manage to update our abstract on the program page? I've checked the latest version and the pdf renders as intended. Thank you very much, see you at the workshop!

@ilic-mezza
Copy link
Author

Hi @faroit, @TE-StefanUhlich,

We finally released both part 1 and part 2 of our dataset. On Zenodo, we reference our SDX Workshop abstract (Mezza.pdf). Therefore, it would be very nice if the program on the website showed the latest version of the abstract.

Would you mind updating it with the latest version of the PDF, i.e., the one complied after the last two commits?

Thank you very much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants