CLI and library to compute the mel-cepstral distance of two WAV files based on the paper "Mel-Cepstral Distance Measure for Objective Speech Quality Assessment" by Robert F. Kubichek.
pip install mel-cepstral-distance --user
mcd-cli
# Download two example audio files
wget https://github.com/jasminsternkopf/mel_cepstral_distance/raw/main/examples/similar_audios/original.wav
wget https://github.com/jasminsternkopf/mel_cepstral_distance/raw/main/examples/similar_audios/inferred.wav
# Calculate metrics
mcd-cli from-wav original.wav inferred.wav
Output:
Mel-Cepstral Distance: 19.013673608495836
Penalty: 0.11946050096339111
# Frames: 519
This will print a message informing you about the mel-cepstral distance and penalty between the audios whose paths were given as arguments and the number of frames that were used in the computation.
from mel_cepstral_distance import get_metrics_wavs, get_metrics_mels, get_metrics_mels_pairwise
get_metrics_wavs
get_metrics_mels
Both methods return the mel-cepstral distance, the penalty and the final frame number. Examples and information on the parameters can be found in the corresponding documentations.
# update
sudo apt update
# install Python 3.8-3.11 for ensuring that tests can be run
sudo apt install python3-pip \
python3.8 python3.8-dev python3.8-distutils python3.8-venv \
python3.9 python3.9-dev python3.9-distutils python3.9-venv \
python3.10 python3.10-dev python3.10-distutils python3.10-venv \
python3.11 python3.11-dev python3.11-distutils python3.11-venv
# install pipenv for creation of virtual environments
python3.8 -m pip install pipenv --user
# check out repo
git clone https://github.com/jasminsternkopf/mel_cepstral_distance.git
cd mel_cepstral_distance
# create virtual environment
python3.8 -m pipenv install --dev
# first, install the tool (see "Development setup")
# then, navigate into the directory of the repo
cd mel_cepstral_distance
# activate environment
python3.8 -m pipenv shell
# run tests
tox
MIT License
- Kubichek, R. “Mel-Cepstral Distance Measure for Objective Speech Quality Assessment.” In Proceedings of IEEE Pacific Rim Conference on Communications Computers and Signal Processing, 1:125–28. Victoria, BC, Canada: IEEE, 1993. https://doi.org/10.1109/PACRIM.1993.407206.
- Muda, Lindasalwa, Mumtaj Begam, and I. Elamvazuthi. “Voice Recognition Algorithms Using Mel Frequency Cepstral Coefficient (MFCC) and Dynamic Time Warping (DTW) Techniques.” Journal of Computing vol. 2, no. 3 (March 2010): 6.
Funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) – Project-ID 416228727 – CRC 1410
If you want to cite this repo, you can use the BibTeX-entry generated by GitHub (see About => Cite this repository).
Sternkopf, J., & Taubert, S. (2024). mel-cepstral-distance (Version 0.0.3) [Computer software]. https://doi.org/10.5281/zenodo.10567255
We based some of the parameters on the two mentioned references and set the other ones by ourselves depending on the parameter description of the underlying libraries:
hop-length
-> 256: Kubichek & Muda et al.window
-> hamming: Muda et al.n-mels
-> 20: Kubichek- Battenberg et al. (2019) computed the first 13 MFCCs
n-mfcc
-> 16: by usn-fft
-> 1024: by uscenter
-> False: by ushtk
-> False: by usnorm
-> None: by usdtw
-> True: by us- calculate the MCD-DTW, which is used as metric in works like:
The dependency numba
is currently not available for Python 3.12.