This is a Pytorch code repository accompanying the following paper. If you use this code in your research, please cite:
@inproceedings{StrahlM26_MidLevelFusionF0_ICASSP,
author = {Sebastian Strahl and Meinard M{\"u}ller},
title = {Robust And Lightweight {F0} Estimation Through Mid-Level Fusion of {DSP}-Informed Features},
booktitle = {Proceedings of the {IEEE} International Conference on Acoustics, Speech, and Signal Processing ({ICASSP})},
address = {Barcelona, Spain},
year = {2026},
pages = {16087--16091},
doi = {10.1109/ICASSP55912.2026.11463185},
}For details and references, please check out this paper.
# clone project
git clone https://github.com/groupmm/f0-mlf
cd f0-mlf
# create conda environment and install dependencies
conda env create -f environment.yaml
# activate conda environment
conda activate f0-mlf-
Copy
.env.exampleto.envand setDATA_DIRto your data base path. -
Training is done on MIR-1K. Download the dataset from here and place it under
$DATA_DIR/MIR-1K/with the following structure:$DATA_DIR └── MIR-1K ├── PitchLabel └── Wavfile -
Convert the F0 annotations from MIDI pitch to Hz (and add a time column). This creates a new
PitchLabel_csvfolder alongsidePitchLabel:python scripts/prepare_mir1k.py
Train model with default configuration
python src/train.pyTrain model with chosen experiment configuration from configs/experiment/
python src/train.py experiment=experiment_name.yamlYou can override any parameter from command line like this
python src/train.py trainer.max_epochs=20 data.batch_size=64To evaluate a trained model (e.g. on MIR-1K test subset, 0 dB SNR, last checkpoint):
python src/eval.py model=mlf data=mir1k model.snr_range=0 ckpt_path=logs/train/runs/YYYY-MM-DD_HH-MM-SS/checkpoints/last.ckptAutomated code style checks via pre-commit:
pre-commit install
pre-commit run --all-filesThis code is published under an MIT license.
This codebase builds upon the lightning-hydra-template and mir_eval, parts of which are re-distributed here. We thank the authors and maintainers of both projects for their work.
This work was funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Grant No. 500643750 (MU 2686/15-1). The authors are with the International Audio Laboratories Erlangen, a joint institution of the Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU) and Fraunhofer Institute for Integrated Circuits IIS.