Skip to content

Pytorch implementation of "CleanMel: Mel-Spectrogram Enhancement for Improving Both Speech Quality and ASR".

License

Notifications You must be signed in to change notification settings

Audio-WestlakeU/CleanMel

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CleanMel

Paper Demos GitHub Issues Contact

PyTorch implementation of "CleanMel: Mel-Spectrogram Enhancement for Improving Both Speech Quality and ASR".

Notice 📢

  • All models are available in pretrained/enhancement/ folder.
  • The enhanced results from 4 offline_CleanMel_S/L_mask/map models for the CHIME example noisy_CHIME-real_F05_442C020S_STR_REAL are given in src/inference_example/pretrained_example_output folder.
  • To reproduce the results, make sure to use the our vocos models here!

Overview 🚀

jpg name

CleanMel enhances logMel spectrograms for improved speech quality and ASR performance. Outputs compatible with:

  • 🎙️ Vocoders for enhanced waveforms
  • 🤖 ASR systems for transcription

Quick Start ⚡

Environment Setup

conda create -n CleanMel python=3.10.14
conda activate CleanMel
pip install -r requirements.txt

Inference

# Offline example (offline_CleanMel_S_mask)
cd shell
bash inference.sh 0, offline S mask

# Online example (online_CleanMel_S_map)
bash inference.sh 0, online S map

Custom Input: Modify speech_folder in inference.sh

Output: Results saved to output_folder (default to ./my_output)

Training

# Offline training example (offline_CleanMel_S_mask)
cd shell
bash train.sh 0,1,2,3 offline S mask

Configure datasets in ./config/dataset/train.yaml

Default 4 GPUs trained with batch size 32

Pretrained Models 🧠

pretrained/
├── enhancement/
│   ├── offline_CleanMel_S_map.ckpt
│   ├── offline_CleanMel_S_mask.ckpt
│   ├── online_CleanMel_S_map.ckpt
|   └── ...
└── vocos/
    ├── vocos_offline.pt
    └── vocos_online.pt

Enhancement: offline_CleanMel_S/L_mask/map.ckpt are available.

Vocos: vocos_offline.pt and vocos_online.pt are here.

Performance 📊

Speech Enhancement

jpg name

jpg name

ASR Accuracy

png name

💡 ASR implementation details in asr_infer branch

Citation 📝

@misc{shao2025cleanmel,
    title={CleanMel: Mel-Spectrogram Enhancement for Improving Both Speech Quality and ASR}, 
    author={Nian Shao and Rui Zhou and Pengyu Wang and Xian Li and Ying Fang and Yujie Yang and Xiaofei Li},
    year={2025},
    eprint={2502.20040},
    archivePrefix={arXiv},
    primaryClass={eess.AS},
    url={https://arxiv.org/abs/2502.20040}
}

Acknowledgement 🙏

  • Built using NBSS template
  • Vocoder implementation from Vocos

About

Pytorch implementation of "CleanMel: Mel-Spectrogram Enhancement for Improving Both Speech Quality and ASR".

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published