An algorithm that facilitates communication between a speech-impaired person and someone who doesn't understand sign language using convolution neural networks
Training set: 1080 pictures (64 by 64 pixels) of signs representing numbers from 0 to 5 (180 pictures per number).
Test set: 120 pictures (64 by 64 pixels) of signs representing numbers from 0 to 5 (20 pictures per number).
Here are examples for each number, and corresponding labels converted to one-hot.
Architecture:
- Input is an image of size 64 x 64 x 3 (RGB), which is normalized by dividing 255
- Model:
- The output of last hidden layer gives a probability of the image belonging to one of the six classes
- RELU activation function. Cross entropy cost. Adam optimizer
- Mini-batch gradient descent with minibatch_size of 64
The model is CONV2D -> RELU -> MAXPOOL -> CONV2D -> RELU -> MAXPOOL -> FLATTEN -> FULLYCONNECTED
Outcome:
- Training cost graph-
- Train Accuracy - 0.92963
Test Accuracy - 0.791667 - TODO- to overcome overfitting, add L2 or dropout regularization