Code for the analysis of SHS-YT, a dataset of videos crawled from YouTube based on seed songs in SHS100K-Test.
We recommend using our conda environment. Install and activate by:
conda env create -f env.yml;
conda activate shs-yty
- data directory
datacontains our annotated datasetSHS-YTand the benchmark sets (SHS-YTcombined with the songs corresponding songs for cliques fromSHS100K-Test). Other datasets are Da-Tacos and SHS100K. - the subdir
data/annotationscontains expert and worker comments - the subdir
data/predscontains the square similarity matrix per model - the subdir
featurescontains a sample of the audio features figscontains the GUI of the MTurk experimentdocumentationcontains descriptions for our classes
To download and extract relevant features for the CSI task, you can use this repository: https://github.com/progsi/YTFeatureExtractor
For example, to download the large benchmark dataset SHS-SEED+YT saved to BENCHMARK_CSV_PATH, run
python extract_list.py --listfile BENCHMARK_CSV_PATH -i YOUR_DATA_DIR
This directory contains different notebooks for analysis of data.
benchmark.ipynbbenchmarking the datasets (Table 5in the paper)statistics.ipynbbasic stats, KDEs etccuration_analysis.ipynbmore profound analysis of amiguity annotationspairs_analysis.ipynbcontains analyses fromTable 6,Table 7andFigure 5from the paper and some additional analyses.
| Uncertainty | Applies for | Description |
|---|---|---|
| Song: Difficult Cover | Version | Strong changes in melody, harmony, timbre and rhythm which are expected in cover song identification. During annotation stronger changes of these characteristics make the classification for a human annotator difficult, especially if the annotator does not know the song. |
| Song: Drum-Only | Version & Non-Version | Only the drum track. Typically either isolated by automatic sound source separation, covered by a drummer or programmed in a drum engine. |
| Song: Instrumental | Version & Non-Version | A version without the vocal track. Typically an karaoke version or a backing track. Might be generated by automatic sound source separation. |
| Song: Mashup/Remix | Version & Non-Version | A song which contains samples from the query song. The samples might be whole sections (typically the chorus) or just very short melodic lines. |
| Song: Medley | Version & Non-Version | A song which contains (typically sections of) multiple songs. One of the songs is (a section of) the query song. |
| Song: Same Artist | Non-Version | A different song but it is from the same artist. |
| Song: Same Genre | Non-Version | A different song but it is from the same genre. |
| Song: Similar | Non-Version | A different song but it is musically similar in terms of melody, harmony, timbre, rhythm etc. |
| Song: Single Instrument | Version & Non-Version | A song which includes only a stem of a single harmonic instrument. The instruments which are apparently occuring most are the piano and the guitar. Typically, either someone covers the query song by playing itself or the stem performance is programmed (eg. piano roll representation). |
| Song: Slowed/Spedup | Version & Non-Version | The query song but sped-up or slowed down. |
| Song: Vocal-Only | Version & Non-Version | Only the vocal stem of the query song. Either isolated automatically by sound source seperation or an acapella cover. |
| Video: In-Background | Version & Non-Version | The query song appears in the background with foreground noise such as crowd noise or speech or mixed noise (eg. in a movie or show scene). |
| Video: Low Fidelity | Version & Non-Version | The query song is presented with low fidelity. |
| Video: Multiple Songs | Version & Non-Version | Multiple songs beside the query song are contained in the video. Typical examples are concert performances or tributes. |
| Video: Similar Metadata | Non-Version | A rather obvious non-cover-song of the query with rather similar metadata (especially song title and artist name), which might confuse the annotator. |
| Video: With Non-Music | Version & Non-Version | The query song is contained in the video but it is interrupted by (and/or) preceded by (and/or) preceding non-music noise. |
| Placeholder: No Music | No Music | Placeholder class for videos which do not contain any music. |
| Placeholder: Non-Ambiguous | Version & Non-Version | Placeholder class for songs which were not perceived ambiguous. |
| Placeholder: Unavailable | All | Placeholder class for unavailable videos on YouTube at the time of curation. |