Skip to content

sdonatti/nd892-project-dnn-speech-recognizer

 
 

Repository files navigation

Udacity Natural Language Processing Nanodegree

Speech Recognition with Neural Networks

ASR Pipeline

This project develops a recurrent neural network that functions as part of an end-to-end (A)utomatic (S)peech (R)ecognition pipeline. It converts raw audio from LibriSpeech ASR corpus into Spectrogram or MFCC feature representations and uses them to generate transcribed text automatically.

Requirements

  1. Download and install Git
  2. Download and install Anaconda
  3. Download and install FFmpeg

Data Folders

Set-up

Clone the project repository

git clone https://github.com/sdonatti/nd892-project-dnn-speech-recognizer

Install required Python packages

cd nd892-project-dnn-speech-recognizer
conda env create -f environment.yaml
conda activate nd892-project-dnn-speech-recognizer

Define the datasets

python flac_to_wav.py data/LibriSpeech/dev-clean
python flac_to_wav.py data/LibriSpeech/test-clean
python create_desc_json.py data/LibriSpeech/dev-clean train_corpus.json
python create_desc_json.py data/LibriSpeech/test-clean valid_corpus.json

Launch the project Jupyter Notebook

jupyter notebook vui_notebook.ipynb

License

This project is licensed under the MIT License

About

Deep Neural Network that functions as part of an end-to-end Automatic Speech Recognition pipeline

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • HTML 58.0%
  • Jupyter Notebook 41.1%
  • Python 0.9%