CleanMel

PyTorch implementation of "CleanMel: Mel-Spectrogram Enhancement for Improving Both Speech Quality and ASR".

Notice 📢

All models are available in pretrained/enhancement/ folder.
The enhanced results from 4 offline_CleanMel_S/L_mask/map models for the CHIME example noisy_CHIME-real_F05_442C020S_STR_REAL are given in src/inference_example/pretrained_example_output folder.
To reproduce the results, make sure to use the our vocos models here!

Overview 🚀

CleanMel enhances logMel spectrograms for improved speech quality and ASR performance. Outputs compatible with:

🎙️ Vocoders for enhanced waveforms
🤖 ASR systems for transcription

Quick Start ⚡

Environment Setup

conda create -n CleanMel python=3.10.14
conda activate CleanMel
pip install -r requirements.txt

Inference

# Offline example (offline_CleanMel_S_mask)
cd shell
bash inference.sh 0, offline S mask

# Online example (online_CleanMel_S_map)
bash inference.sh 0, online S map

Custom Input: Modify speech_folder in inference.sh

Output: Results saved to output_folder (default to ./my_output)

Training

# Offline training example (offline_CleanMel_S_mask)
cd shell
bash train.sh 0,1,2,3 offline S mask

Configure datasets in ./config/dataset/train.yaml

Default 4 GPUs trained with batch size 32

Pretrained Models 🧠

pretrained/
├── enhancement/
│   ├── offline_CleanMel_S_map.ckpt
│   ├── offline_CleanMel_S_mask.ckpt
│   ├── online_CleanMel_S_map.ckpt
|   └── ...
└── vocos/
    ├── vocos_offline.pt
    └── vocos_online.pt

Enhancement: offline_CleanMel_S/L_mask/map.ckpt are available.

Vocos: vocos_offline.pt and vocos_online.pt are here.

Performance 📊

Speech Enhancement

ASR Accuracy

💡 ASR implementation details in asr_infer branch

Citation 📝

@misc{shao2025cleanmel,
    title={CleanMel: Mel-Spectrogram Enhancement for Improving Both Speech Quality and ASR}, 
    author={Nian Shao and Rui Zhou and Pengyu Wang and Xian Li and Ying Fang and Yujie Yang and Xiaofei Li},
    year={2025},
    eprint={2502.20040},
    archivePrefix={arXiv},
    primaryClass={eess.AS},
    url={https://arxiv.org/abs/2502.20040}
}

Acknowledgement 🙏

Built using NBSS template
Vocoder implementation from Vocos

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
configs		configs
data_loader		data_loader
model		model
pretrained		pretrained
shell		shell
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CleanMel

Notice 📢

Overview 🚀

Quick Start ⚡

Environment Setup

Inference

Training

Pretrained Models 🧠

Performance 📊

Speech Enhancement

ASR Accuracy

Citation 📝

Acknowledgement 🙏

About

Releases

Packages

Languages

License

Audio-WestlakeU/CleanMel

Folders and files

Latest commit

History

Repository files navigation

CleanMel

Notice 📢

Overview 🚀

Quick Start ⚡

Environment Setup

Inference

Training

Pretrained Models 🧠

Performance 📊

Speech Enhancement

ASR Accuracy

Citation 📝

Acknowledgement 🙏

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages