This is the repository of the ISMIR paper, dPLP: A Differentiable Version of Predominant Local Pulse Estimation. In the following, we provide the scripts for training the models, inferencing, peak-picking, and evaluation.
- Datasets: as mentioned in the paper, this repo also includes 100 pop tracks from GTZAN[1] for the experiments, organized as follows:
- dataset/gtzan:
- downbeats: the beat/downbeat annotations
- audio: please upload the gtzan audio files to this folder following the naming of ./dataset/gtzan/audio/audio_files.txt
- train-info: the train/test/valid splits
- dataset/gtzan:
- Scripts: All .py scripts can be run directly at this project folder. See the following sections for the orders and purposes of them.
- Run the following scripts following the order, you may train your own models of SFX, M1, M2, M3. These scripts will create a folder at ./experiments/ and save the trained models. One version of the pretrained models are also provided at ./pretrained
- train-sfx.py: train the spectral flux model using the train/test/valid split in dataset/gtzan/train-info.
- train-M1.py: train M1 model using the same train/test/valid split mentioned above.
- train-M2.py: train the M2 model using the same train/test/valid split mentioned above. Note that M2 requires a pretrained SFX (directory should be specified at line 328).
- train-M3.py: train the spectral flux model using the same train/test/valid split mentioned above.
- The following scripts will create a folder at dataset/gtzan/novelty save the corresponding generated novelty functions.
- inference-dSFX.py: derive beat novelty functions using the initial/trained dSFX models.
- inference-dPLP.py: derive beat novelty functions using the trained M1, M2, and M3 models.
genPLP4comparison.py: This script reads the novelty functions generated by the trained SFX model (using inference-dSFX.py) and generate argmax PLP and softmax PLP curves. The generated PLP curves will be saved in a newly created folder at ./datasets/gtzan/plp-curves.
- The following scripts read novelty functions, apply peak picking to derive beat estimates.
- peak-picking-plpcompare.py: This script reads all PLP curves saved at ./datasets/gtzan/plp-curves, applies peak picking, and saves the beat estimates at ./datasets/gtzan/beat_estimations_plpcompare.
- peak-picking.py: This script reads all novelty functions saved in ./dataset/gtzan/novelty and applies peak picking to derive corresponding beat estimates. Estimates will be saved to ./dataset/gtzan/beat_estimations.
evaluate.py: This script reads beat estimates and save all the result at ./evaluation accordingly. You may modify the directories to evaluate beat estimates of argmax/softmax PLP.
This project is licensed under the MIT License - see the LICENSE file for details.
[1] G. Tzanetakis and P. Cook, “Musical genre classification of audio signals,” IEEE Trans. Speech and Audio Processing, vol. 10, no. 5, pp. 293–302, 2002.