Skip to content

speech-recognition123/Speech-to-Text

 
 

Repository files navigation

African language Speech Recognition - Speech-to-Text

Forks Badge Pull Requests Badge Issues Badge GitHub contributors License Badge

Table of Contents

Introduction

Speech is the most natural communication mode for human beings. The task of speech recognition is to convert speech into a sequence of words by a computer program. Speech recognition applications enable people to use speech as another input mode to interact with applications with ease and effectively. Speech recognition interfaces in native language will enable the illiterate/semi-literate people to use the technology to greater extent without the knowledge of operating with computer keyboard or stylus. For more than three decades, a great amount of research was carried out on various aspects of speech recognition and its applications. Today many products have been developed that successfully utilize automatic speech recognition for communication between human and machines. Performance of speech recognition applications deteriorates in the presence of reverberation and even low levels of ambient noise. Robustness to noise, reverberation and characteristics of the transducer is still an unsolved problem that makes the research in the area of speech recognition still very active.

Speech recognition technology allows for hands-free control of smartphones, speakers, and even vehicles in a wide variety of languages. Companies have moved towards the goal of enabling machines to understand and respond to more and more of our verbalized commands. There are many matured speech recognition systems available, such as Google Assistant, Amazon Alexa, and Apple’s Siri. However, all of those voice assistants work for limited languages only.

The World Food Program wants to deploy an intelligent form that collects nutritional information of food bought and sold at markets in two different countries in Africa - Ethiopia and Kenya. The design of this intelligent form requires selected people to install an app on their mobile phone, and whenever they buy food, they use their voice to activate the app to register the list of items they just bought in their own language. The intelligent systems in the app are expected to live to transcribe the speech-to-text and organize the information in an easy-to-process way in a database.

Our responsibility was to build a deep learning model that is capable of transcribing a speech to text in the Amharic language. The model we produce will be accurate and is robust against background noise.

Installation guide

Conda Enviroment

conda create --name mlenv python==3.7.5
conda activate mlenv

Installation of dependencies

git clone https://github.com/week4-SpeechRecognition/Speech-to-Text.git
cd Speech-to-Text
sudo python3 setup.py install

Docker run backend api

docker pull abelblue/api:1.0
git checkout -b backend
docker run abelblue/api:1.0

architecture

speech-to-text deep learning architecture

Project Structure

  • images/ the folder where all snapshot for the project are stored.
  • *.dvc the folder where the dataset versioned files are stored.
  • .dvc/: the folder where dvc configured for data version control.
  • .github/: the folder where github actions and CML workflow is integrated.
  • .vscode/: the folder where local path fix are stored.
  • models/ the folder where model pickle files are stored.
  • notebooks/: include all notebooks for deep-learning and meta-data.
  • *.py: Scripts for modularization, logging, and packaging.

root folder

  • requirements.txt: a text file lsiting the projet's dependancies.
  • README.md: Markdown text with a brief explanation of the project and the repository structure.
  • Dockerfile: build users can create an automated build that executes several command-line instructions in a container.

Contributors

Made with contributors-img.

About

African language Speech Recognition - Speech-to-Text

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 9

Languages