This software project accompanies the research paper, dMel: Speech Tokenization Made Simple by Bai, He and Likhomanenko, Tatiana and Zhang, Ruixiang and Gu, Zijin and Aldeneh, Zakaria and Jaitly, Navdeep on speech tokenization for speech generation and speech recognition.
Repository contains the dmel pytorch-based package which performs discretization of the log mel-filterbanks for the given audio to prepare speech representations for decoder model training which will be generative model of speech.
- from pypi
 
pip install dmel- from source
 
pip install .We have a snipped of code to run feature extraction for both dMel and Mel and plotting their representations. To run example:
pip install torchaudio matplotlib dmel
python run_example.pyThe example will generate example_mel.png and example_dmel.png
Repository is under LICENSE.
@article{bai2024dmel,
  title={dMel: Speech Tokenization Made Simple},
  author={Bai, He and Likhomanenko, Tatiana and Zhang, Ruixiang and Gu, Zijin and Aldeneh, Zakaria and Jaitly, Navdeep},
  journal={arXiv preprint arXiv:2407.15835},
  year={2024}
}