Speech is the most natural communication mode for human beings. The task of speech recognition is to convert speech into a sequence of words by a computer program. Speech recognition applications enable people to use speech as another input mode to interact with applications with ease and effectively. Speech recognition interfaces in native language will enable the illiterate/semi-literate people to use the technology to greater extent without the knowledge of operating with computer keyboard or stylus. For more than three decades, a great amount of research was carried out on various aspects of speech recognition and its applications. Today many products have been developed that successfully utilize automatic speech recognition for communication between human and machines. Performance of speech recognition applications deteriorates in the presence of reverberation and even low levels of ambient noise. Robustness to noise, reverberation and characteristics of the transducer is still an unsolved problem that makes the research in the area of speech recognition still very active.
Speech recognition technology allows for hands-free control of smartphones, speakers, and even vehicles in a wide variety of languages. Companies have moved towards the goal of enabling machines to understand and respond to more and more of our verbalized commands. There are many matured speech recognition systems available, such as Google Assistant, Amazon Alexa, and Apple’s Siri. However, all of those voice assistants work for limited languages only.
The World Food Program wants to deploy an intelligent form that collects nutritional information of food bought and sold at markets in two different countries in Africa - Ethiopia and Kenya. The design of this intelligent form requires selected people to install an app on their mobile phone, and whenever they buy food, they use their voice to activate the app to register the list of items they just bought in their own language. The intelligent systems in the app are expected to live to transcribe the speech-to-text and organize the information in an easy-to-process way in a database.
Our responsibility was to build a deep learning model that is capable of transcribing a speech to text in the Amharic language. The model we produce will be accurate and is robust against background noise.
conda create --name mlenv python==3.7.5
conda activate mlenv
git clone https://github.com/week4-SpeechRecognition/Speech-to-Text.git
cd Speech-to-Text
sudo python3 setup.py install
docker pull abelblue/api:1.0
git checkout -b backend
docker run abelblue/api:1.0
images/
the folder where all snapshot for the project are stored.
data:
*.dvc
the folder where the dataset versioned files are stored.
.dvc:
.dvc/
: the folder where dvc configured for data version control.
.github/
: the folder where github actions and CML workflow is integrated.
.vscode/
: the folder where local path fix are stored.
models/
the folder where model pickle files are stored.
notebooks/
: include all notebooks for deep-learning and meta-data.
*.py
: Scripts for modularization, logging, and packaging.
requirements.txt
: a text file lsiting the projet's dependancies.README.md
: Markdown text with a brief explanation of the project and the repository structure.Dockerfile
: build users can create an automated build that executes several command-line instructions in a container.
Made with contributors-img.