ABSTRACT

A2E is a CNN based system which coverts static hand gestures of the American Sign Language to English Text. A2E aims to solve/reduce the communication gap between an average person and a differently-abled person.

It uses various libraries like TensorFlow (Keras), scikit-learn, OpenCV, etc for different purposes. Various concepts have been used for this project, such as: CNN, Data Augmentation and Transfer Learning.

A2E does the conversion in real time.

Concepts learned and used during this project:

Convolutional Neural Network
Transfer Learning
Image Augmentation

DATA COLLECTION AND CLEANING

We tried to collect data from 50 different sources, after few weeks of hard work we were able to do so.... BUT.... the data we collected turned out to be VERY noisy and basically unusable. So we had to come up with a different solution.

Hence, the "webcam data.ipynb" notebook. We used this notebook to take and save pictures using OpenCV. We took around 50 pictures for each alphabets of shape (200,200,3). No further cleaning was required as the pictures taken were in pretty good condition.

DATA AUGMENTATION AND PREPROCESSING

The amount of data we had was not enough to counteract over fitting and decent generalization. So we had to use data augmentation. We used the ImageDataGenerator class for this purpose. The training data was allowed to be augmented in various ways, but the validation and test data only went through Normalization ( division by 255 ).

SHALLOW CNN MODEL

We tried various models before we could finalize one. The model was giving about 86-90% accuracy before we started hyper-tuning it. We used the "Custom GridSearch w Augmentation.ipynb" notebook to achieve this. After this the model was giving about 92-96% accuracy. The problem with these models was that they could not perform well on the "testing live.ipynb" notebook even after having nice accuracies on the train-validate-test sets.

TRANSFER LEARNING

After realizing that the Shallow CNN might not make the cut we decided to give transfer learning a shot. We tried various architectures with pre-trained weights on the ImageNet data-set. Finally we landed on the conclusion that VGG16 with a single layer ANN on it works the best. After that, it was the hyper-tuning game again. This model gave an accuracy around 89% which disappointed us in the beginning, but after using it on the "testing live.ipynb" notebook it was declared the ultimate winner.

CONCLUSION AND FINAL OUTCOME

Using the "testing live.ipynb" notebook , we were able to convert the American Sign Language Alphabets to English Text in realtime by using the OpenCV library. The model performs perfectly well with decent lights and plain background. Even though the model achieved an overall accuracy of 89% , it still converts all the American Sign Language Alphabets to English text perfectly.

You can download the APK here.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
.ipynb_checkpoints		.ipynb_checkpoints
Clean Data		Clean Data
Data		Data
Models		Models
Custom GridSearch w Augmentation.ipynb		Custom GridSearch w Augmentation.ipynb
README.md		README.md
Shallow CNN.ipynb		Shallow CNN.ipynb
Transfer Learning.ipynb		Transfer Learning.ipynb
check all data.ipynb		check all data.ipynb
image to csv mapping.ipynb		image to csv mapping.ipynb
testing live.ipynb		testing live.ipynb
webcam data.ipynb		webcam data.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ABSTRACT

DATA COLLECTION AND CLEANING

DATA AUGMENTATION AND PREPROCESSING

SHALLOW CNN MODEL

TRANSFER LEARNING

CONCLUSION AND FINAL OUTCOME

About

Contributors 2

Languages

h-r-v/A2E-Model

Folders and files

Latest commit

History

Repository files navigation

ABSTRACT

DATA COLLECTION AND CLEANING

DATA AUGMENTATION AND PREPROCESSING

SHALLOW CNN MODEL

TRANSFER LEARNING

CONCLUSION AND FINAL OUTCOME

About

Resources

Stars

Watchers

Forks

Contributors 2

Languages