- Result
- Overview
- Abstract
- Installation
- Run
- Training
- Pretrained models
- Technologies Used
- License
- Credits
It is a vision-based system in which deep 3d CNN arhitecture is used to recognize ISL hand gesture in real-time and video using tranfer learning. It recognize 10 ISL hand gestures for numeric digits (0-9) in which all are static gesture except gesture for 6 which is dynamic. But, it can be extended for large no. of gesture classes without requiring huge amount of data. It gives around 85 % accuracy on video stream.
Real-time recognition of ISL hand gestures using vision-based system is a challenging task because there is no indication when a dynamic gesture is starts and ends in a video stream and there is no ISL data publically available unlike ASL, to work on. In this work, i handle these challenges by doing transfer learning and operating deep 3D CNN architecture using sliding window approach. Sliding window approach suffers with multiple time activations problem but i remove it by doing some post processing. To find the region of interest(RoI) is also a difficult task, i solve this using face detection algorithm. The proposed architecture consists of two models: (1) A detector which is a lightweight CNN architecture to detect gestures and (2) a classifier which is a deep CNN to classify the detected gestures. To measure misclassifications, multiple detections, and missing detections at the same time, i use Levenshtein Distance. Using this, i find Levenshtein accuracy on video stream. i create own dataset of 10 ISL hand gestures for numeric digits(0-9), in which just 70 samples are created for each gesture class. i fine tune ResNeXt-101 model on the dataset, which is used as a classifier, achieves good classification accuracy of 95.79 % and 94.39 % on training set and validation set respectively and around 85 % considerable accuracy on video stream.
Just install the necessary libraries mentioned in the requirements.txt.
To run the app, just run this command after cloning the repository, installing the necessary libraries and downloading the models.
python app.py
Note: I tested it only on windows not on other os platforms like Linux, macOS.
I used Google colab GPU to train or fine tune the classifier.
Use training.ipynb to train or fine tune the classifier on Google colab GPU or your own GPU
Download pretrained ResNeXt_101 classifier model from here, which is trained on jester largest dynamic hand gesture dataset.
Download pretrained ResNetl_10 detector model from here, which is trained on Egogesture hand gesture dataset.
Download fine tuned ResNeXt_101 classifier model from here, which is fine tuned on our ISL hand gesture dataset.
Note: To run the app you would just need detector and classifier, after downloading, place them in same directory where all other files are present.
Licensed under MIT Licencse
I thank Okan Köpüklü, Ahmet Gündüz et al. for providing the codebase and i build this project on top of that.
I also thank my freinds Kunal Singh Bhandari, Mohd. Bilal and Digant Bhanwariya who all helped me in Web App design and data creation.
I also want to thank to Google for providing free Colab GPU service to everyone, due to which i was able to train the model.