PyTorch implementation of "CleanMel: Mel-Spectrogram Enhancement for Improving Both Speech Quality and ASR".
- All models are available in
pretrained/enhancement/
folder. - The enhanced results from 4
offline_CleanMel_S/L_mask/map
models for the CHIME examplenoisy_CHIME-real_F05_442C020S_STR_REAL
are given insrc/inference_example/pretrained_example_output
folder. - To reproduce the results, make sure to use the our vocos models here!
CleanMel enhances logMel spectrograms for improved speech quality and ASR performance. Outputs compatible with:
- 🎙️ Vocoders for enhanced waveforms
- 🤖 ASR systems for transcription
conda create -n CleanMel python=3.10.14
conda activate CleanMel
pip install -r requirements.txt
# Offline example (offline_CleanMel_S_mask)
cd shell
bash inference.sh 0, offline S mask
# Online example (online_CleanMel_S_map)
bash inference.sh 0, online S map
Custom Input: Modify speech_folder
in inference.sh
Output: Results saved to output_folder
(default to ./my_output
)
# Offline training example (offline_CleanMel_S_mask)
cd shell
bash train.sh 0,1,2,3 offline S mask
Configure datasets in ./config/dataset/train.yaml
Default 4 GPUs trained with batch size 32
pretrained/
├── enhancement/
│ ├── offline_CleanMel_S_map.ckpt
│ ├── offline_CleanMel_S_mask.ckpt
│ ├── online_CleanMel_S_map.ckpt
| └── ...
└── vocos/
├── vocos_offline.pt
└── vocos_online.pt
Enhancement: offline_CleanMel_S/L_mask/map.ckpt
are available.
Vocos: vocos_offline.pt
and vocos_online.pt
are here.
💡 ASR implementation details in asr_infer
branch
@misc{shao2025cleanmel,
title={CleanMel: Mel-Spectrogram Enhancement for Improving Both Speech Quality and ASR},
author={Nian Shao and Rui Zhou and Pengyu Wang and Xian Li and Ying Fang and Yujie Yang and Xiaofei Li},
year={2025},
eprint={2502.20040},
archivePrefix={arXiv},
primaryClass={eess.AS},
url={https://arxiv.org/abs/2502.20040}
}