Skip to content
gabolsgabs edited this page Aug 8, 2018 · 13 revisions

Welcome to the DALI dataset: a large Dataset of synchronised Audio, LyrIcs and vocal notes. You can find a detailed explanation of how DALI has been created at

G. Meseguer-Brocal, A. Cohen-Hadria and G. Peeters. DALI: a large Dataset of synchronized Audio, LyrIcs and notes, automatically created using teacher-student machine learning paradigm. In ISMIR Paris, France, 2018.


(C1) Corpus ID: corpus:MIR:DALI:Vocal:2018:version1.0


(A) Raw Corpus
(A1) Definition: (A2) Type of media diffusion:


(B) Annotations (B1) Origin: (B21) Concepts definition: (B22) Annotation rules: (B31) Annotators: (B32) Validation/ reliability: (B4) Annotation tools:


(C) Documents and Storing
(C1) Audio identifier and storage:

Clone this wiki locally