This project develops a recurrent neural network that functions as part of an end-to-end (A)utomatic (S)peech (R)ecognition pipeline. It converts raw audio from LibriSpeech ASR corpus into Spectrogram or MFCC feature representations and uses them to generate transcribed text automatically.
- data
- LibriSpeech
Clone the project repository
git clone https://github.com/sdonatti/nd892-project-dnn-speech-recognizer
Install required Python packages
cd nd892-project-dnn-speech-recognizer
conda env create -f environment.yaml
conda activate nd892-project-dnn-speech-recognizer
Define the datasets
python flac_to_wav.py data/LibriSpeech/dev-clean
python flac_to_wav.py data/LibriSpeech/test-clean
python create_desc_json.py data/LibriSpeech/dev-clean train_corpus.json
python create_desc_json.py data/LibriSpeech/test-clean valid_corpus.json
Launch the project Jupyter Notebook
jupyter notebook vui_notebook.ipynb
This project is licensed under the MIT License