Skip to content

Latest commit

 

History

History
103 lines (74 loc) · 2.71 KB

README.md

File metadata and controls

103 lines (74 loc) · 2.71 KB

Lipify - A Lip Reading Application


Project Dependencies:
  • Python>=3.7.1
  • tensorflow>=2.1.0
  • opencv-python>=4.2.0
  • dlib
  • moviepy>=1.0.1
  • numpy>=1.18.1
  • Pillow
  • matplotlib
  • tqdm
  • pyDot
  • seaborn
  • scikit-learn
  • imutils>=0.5.3

Note: All Dependencies can be found inside 'setup.py'


Project's Dataset Structure:
  • GP DataSet/
    | --> align/
    | --> video/
  • Videos-After-Extraction/
    | --> S1/
    | --> ....
    | --> S20/
  • New-DataSet-Videos/
    | --> S1/
    | --> ....
    | --> S20/
  • S1/
    | --> Adverb/
    | --> Alphabet/
    | --> Colors/
    | --> Commands/
    | --> Numbers/
    | --> Prepositions/


Dataset Info:

We use the GRID Corpus dataset which is publicly available at this link
You can download the dataset using our script: GridCorpus-Downloader.sh
which was adapted from the code provided here

To Download please run the following line of code in your terminal:
bash GridCorpus-Downloader.sh FirstSpeaker SecondSpeaker
where FirstSpeaker and SecondSpeaker are integers for the number of speakers to download

  • NOTE: Speaker 21 is missing from the GRID Corpus dataset due to technical issues.

Datset Segmentation Steps:
  1. Run DatasetSegmentation.py
  2. Run Pre-Processing/frameManipulator.py

* After running the above files, all resultant videos will have 30 FPS and 1 second long.
CNN Models Training Steps:
  • Model codes can be found in the directory "NN-Models"

  • First you will need to change the common path value to the directory of your training and test data.

  • Run Each network to start training.

  • Early stopping was used to help stop the training of the model at its optimum validation accuracy.

  • Resultant accuracies after training on the data can be found in: Project Accuracies

or in the following illustration: General CNN Architecture


CNN Architecture:

All of our networks have the same architecture with the only
difference being the output layer, As shown in:

Train & Test Accuracies of each category


TODOs:
  • Dataset preprocessing module
  • Initial Convolutional Neural networks' architecture
  • Facial detection algorithm
  • Optimization of the networks' architectures
  • Unittesting of project files
  • Proper documentation for the whole project

License:

MIT License