High quality data is always a problem of singing voice synthesis. And it's really laborious to make a database from scratch. We hope AI can make the process a lot easier, so every music lover can make his own synthesised song.
Hopefully, there are just tools satisfy our needs. We combine them and a tool to ease the process.Thanks to all the contibuters of these grate researches. We list them below, so you can go and check their work:
[1] Sangeun Kum et al. “Semi-supervised learning using teacher-student models for vocal melody extraction”. In: Proc. International Society of Music Information Retrieval Conference (ISMIR). 2020.
[2] Xianming Li. XMNLP: A Lightweight Chinese Natural Language Processing Toolkit. https://github.com/SeanLee97/ xmnlp. 2018.
[3] Kilian Schulze-Forster et al. “Phoneme Level Lyrics Alignment and Text-Informed Singing Voice Separation”. In: IEEE/ACM Transactions on Audio, Speech, and Language Processing 29 (2021), pp. 2382–2395. DOI: 10.1109/TASLP.2021. 3091817.
[4] Intelligence Engineering Lab @ Korea University. mdx-net-submission: Music demixer. https://github.com/kuielab/mdx-net-submission 2021.
Notice: The project now support English songs only, with Chinese support in early development. The key is Phoneme Level Lyrics Alignment module, we assume we can deal with it this summer.
We list input and outpit of thetoolkit here, so you can have a genneral idea of whether the project suits your needs.
Input: songs and their .lrc format lyrics.
Output:
- songs devided into slices according to lyrics sentences
- phoneme and word list, with the time their appear in the slice
- a midi file generated by Semi-supervised AI network
We plan to add:
- musicXML generator
- Chinese support
- more precise midi file
- an synthesised example using the database
suppose you have a song called foo. the processed database folder will be like this: (only list the tools you will use)
- origin
- foo.wav
- foo.lrc
- processed_data
- vocal
- slice
- foo00150019
foo00150019.wav
foo00150019.txt
- ...
- pitch
- pitch_foo00150019
- ...
- midi
- foo00150019.mid
- ...
- align
- foo00150019
- phoneme_onsets
- foo00150019.txt
- word_onsets
- foo00150019.txt
- utils
- english-align
- phoneme_from_word
- make_phoe.py
- melodyExtraction
- gen_freq.py
- vocal-extraction
- config.py
- song_cutter.py
- demix_vocal.py
- gen_midi.py
- make_Midi.py
- make_lab.py (not finished)
- make_musicxml.py (not finished)
- delete_useless.py (not tested)
- missing.txt (generate after align)
Download the full project, and its submodules.
We use submodules to ease our development, so you must use --recurse-submodules
to clone the full project.
$ git clone --recurse-submodules git@github.com:leavelet/singing-database-maker.git
When dealing with mutiple AI projects, it will make your life much easier to set up the environment properly at first step. We've had a hard time dealing with all of this, and we found you can use the project on your own pc if you set correctly.
-
install a python virtual environment manager
We use recommand conda, and we take conda as an example.
-
crate environments
$ cd requirements # misc $ conda create -n singing-dealer $ conda activate singing-dealer $ pip install -r make_midi.txt # vocal extraction $ conda env create -f vocal-extraction/environment.yml #if you use arm mac, use environment-m1.yml $ conda activate vocal-extraction $ pip install -r vocal-extraction/requirements.txt # alignment & melody extraction $ conda env create -f vocal-extraction/maker_ai_cpu.yml # if you have gpu, use maker_ai_gpu.yml $ conda activate maker_ai $ pip install -r melody_extraction.txt
-
Download models
- Download demucs models
$ conda activate vocal-extract $ python download_demucs.py
- Download onnx models
Download the models from release page, put them under
utils/vocal-extracion/onnx
folder. -
download the make_dic tool and put it under project root directory
We use config.py
to control the whole project. All the files are programmed to follow settings.
-
set the project root.
Project root is parent folder of utils. We use
..
to mark the root since we are inutils
folder, but an absolute path is recommanded, to avoid mistakes like forgetting to change back toutils
after operation. -
set your thread_num.
We use parallelism to accelerate processing. Set the
thread_num
to a proper number to make full use of your processor. Default is 10.
The whole project is under MIT License, all the projects we used in this project are under their own license.
We do not guarantee the quality of dataset, and before using any data, you must have the appropriate copyright permission.