-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mezza Giampiccolo Bernardini Sarti #5
base: main
Are you sure you want to change the base?
Mezza Giampiccolo Bernardini Sarti #5
Conversation
👋 @ilic-mezza thanks for your submission. We are going to start reviewing end of this week! |
Hello @ilic-mezza, thanks a lot for your submission, which looks interesting. For the abstract, it would be great if you could add the link to the StemGMD dataset and a link to your reference U-Net. Furthermore, could you please specify the nine stems that you consider and how you group them to five stems for the U-Nets that you trained? Besides this, I have the following questions (not required for the abstract - I am just curious 😊):
|
Hello @TE-StefanUhlich, thanks for reviewing our submission. In the updated paper.md file, we specified the nine stems in StemGMD (kick drum, snare, high tom, mid-low tom, floor tom, open hi-hat, closed hi-hat, ride cymbal, crash cymbal) and the five stems separated by the U-Nets (kick drum, snare, tom-toms, hi-hat, cymbals). StemGMD happens to be very big (more than 1 TB). We are currently working with Zenodo to accommodate the hosting of such a large dataset. The link will be made available soon, and we will update the abstract accordingly. The U-Net code is available here: https://github.com/polimi-ispl/larsnet Besides using sound fonts, there would also be the possibility to use a drum sample library to create such a dataset (e.g., https://www.toontrack.com/product/ezdrummer-3/) - did you compare the quality of your sound fonts to such a library? Thank you for pointing out the difference between "sound fonts" and "drum sample libraries." It appears that we used the term "sound font" inappropriately. In fact, StemGMD was created using drum sample libraries from Drum Kit Designer shipped with Logic Pro X, whose quality is (arguably) on par with that of EZDrummer. The abstract was updated to clarify this aspect. Do the sound fonts already include reverberation? Did you add reverb yourself? Yes, all tracks are sent to a bus where room reverberation is applied using sampled IRs. The IR varies across different drum kits. Did you add any compressor effect? How do you avoid clipping in the mix of the five/nine sources? Compression was applied as a data augmentation strategy in training the neural networks but was not used in creating StemGMD. When exporting StemGMD's audio clips, the level of each track was kept at -3 dB to avoid clipping the master channel. Anyhow, we cannot exclude that compression (as well as other FXs) was applied to the drum sounds by the library's creator. How did you train the stereo networks? Did you assume a specific drumset layout and then used a panning technique or is this stereo information already contained in your sound fonts? The stereo information is already contained in the drum samples. However, we apply various data augmentation methods at training time, including doubling and L/R channel swap to mitigate the limitations of the fixed drumset layout that came with Drum Kit Designer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ilic-mezza Thanks a lot for your answer - everything looks fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @ilic-mezza for the final version of the abstract, can you please add the dataset url as a reference of footnote?
Hi @faroit Sure, we will add the link to the final version of the abstract. As for now, we uploaded the dataset on Zenodo and we were assigned a DOI (10.5281/zenodo.7860223). However, the repository is still private as it is being prepared, and it does not have a URL yet. Our plan is to make the dataset public by the end of the week once we're sure everything is in order. Then, we will proceed to update the abstract accordingly—would that be ok? |
paper.md
Outdated
bank of parallel U-Nets that separates five stems (kick drum, snare, tom-toms, hi-hat, cymbals) from a stereo drum mixture through spectro-temporal soft masking. | ||
Such model is meant to serve as a baseline for future research and might complement existing music demixing models. | ||
|
||
[^1] DOI: 10.5281/zenodo.7860223 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that the footnotes doesn't render nicely here.
Maybe you create a real reference and cite it here. Also it would be nice if the DOI is valid hyperlink that would directly link to zenodo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I will submit a new version with a real reference and a working hyperlink to zenodo. We are re-uploading the files on zenodo as I write, as we noticed a problem with the zip files we had already uploaded. Our hope is to be done by the end of the day; as soon as the dataset is online, I'll proceed and commit the new abstract.
Hi @ilic-mezza , as the workshop is coming up very soon, we are finalizing the programm. Stay tuned for the definite time-table for your presentations. In the meantime, we want to clarify the recording and broadcasting rights of the presentations, so please acknowledge the following:
and comment on this by replying with |
|
Hi @faroit, The dataset is finally up! We updated the abstract adding a proper reference with a Zenodo url and doi (10.5281/zenodo.7860223). Let us know if the BibTeX entry types renders well. In particular, we might have to change "@dataset" to "@misc". Best, |
@ilic-mezza just checked, looks good! Thanks |
Hi @ilic-mezza, the workshop is approaching soon, here are few info regarding your presentation:
See you soon! |
Hi, Thank you for accepting our submission. Looking forward to the workshop! As you recently published the program, I noticed that the references in our abstract don't render as I'd expected. I will commit a new version of the Take care! |
@ilic-mezza Sure, will update the pdf tomorrow |
Hi @faroit, do you think you'll manage to update our abstract on the program page? I've checked the latest version and the pdf renders as intended. Thank you very much, see you at the workshop! |
Hi @faroit, @TE-StefanUhlich, We finally released both part 1 and part 2 of our dataset. On Zenodo, we reference our SDX Workshop abstract (Mezza.pdf). Therefore, it would be very nice if the program on the website showed the latest version of the abstract. Would you mind updating it with the latest version of the PDF, i.e., the one complied after the last two commits? Thank you very much! |
Alessandro Ilic Mezza, Riccardo Giampiccolo, Alberto Bernardini, Augusto Sarti
Dear SDX workshop committee,
I request a review for the following SDX submission (Abstract: 250 Words)
Title: StemGMD: A Large-Scale Multi-Kit Audio Dataset for Deep Drums Demixing
Author(s): Alessandro Ilic Mezza, Riccardo Giampiccolo, Alberto Bernardini, and Augusto Sarti
Challenge submission (weather this submission is related to an sdx challenge entry):
Workshop participance
ORGANISATION COMMITTEE (do not fill out)