Skip to content

apple/dmel

Repository files navigation

dMel: Discretized Log Mel-Filterbanks

Paper Blog

This software project accompanies the research paper, dMel: Speech Tokenization Made Simple by Bai, He and Likhomanenko, Tatiana and Zhang, Ruixiang and Gu, Zijin and Aldeneh, Zakaria and Jaitly, Navdeep on speech tokenization for speech generation and speech recognition.

Repository contains the dmel pytorch-based package which performs discretization of the log mel-filterbanks for the given audio to prepare speech representations for decoder model training which will be generative model of speech.

Installation

  • from pypi
pip install dmel
  • from source
pip install .

Example of usage

We have a snipped of code to run feature extraction for both dMel and Mel and plotting their representations. To run example:

pip install torchaudio matplotlib dmel
python run_example.py

The example will generate example_mel.png and example_dmel.png

License

Repository is under LICENSE.

Citation

@article{bai2024dmel,
  title={dMel: Speech Tokenization Made Simple},
  author={Bai, He and Likhomanenko, Tatiana and Zhang, Ruixiang and Gu, Zijin and Aldeneh, Zakaria and Jaitly, Navdeep},
  journal={arXiv preprint arXiv:2407.15835},
  year={2024}
}

Releases

No releases published

Packages

No packages published

Languages