Hand Gesture Recognition

A computer vision and machine learning project for detecting and classifying hand gestures captured from a laptop camera. The project combines face detection, skin-color-based hand localization, dataset generation, and neural network classification to recognize hand-made letter signs.

This repository contains a university computer vision lab project originally developed in Google Colab and later organized for GitHub presentation.

Overview

The project explores a full pipeline for real-time hand gesture recognition:

detect the face in the camera frame,
use the detected face region to estimate a skin-color distribution,
suppress the face region and search for the hand,
capture and preprocess hand images,
build datasets for selected letters,
train MLP models to classify the gestures,
run inference on live camera input.

The selected gesture classes in this project are the letters M, N, and W.

Main components

Face detection using a Haar cascade on grayscale images
Region of interest tracking for identifying relevant areas in the frame
CamShift-based color tracking to model skin-color distribution
Hand extraction and cropping from the video feed
Dataset generation with different class-balance and variability settings
MLP classification for recognizing hand gesture letters
Live prediction on camera input

Repository structure

Automatic-Signal-Detector/
├── README.md
├── .gitignore
├── notebooks/
│   └── CompVision_Ilaria.ipynb
├── models/
│   ├── model1.json
│   ├── model1_weights.h5
│   ├── model2.json
│   ├── model2_weights.h5
│   ├── model3.json
│   └── model3_weights.h5
├── results/
│   ├── dataset1.txt
│   ├── dataset2.txt
│   └── dataset3.txt

Key files

notebooks/CompVision_Ilaria.ipynb — main notebook containing the full project workflow
results/dataset1.txt, results/dataset2.txt, results/dataset3.txt — dataset and experiment output logs
models/ — saved model architectures and trained weights

Method

1. Face detection

The first stage detects the face using a Haar cascade on a grayscale image. Grayscale reduces the amount of information to process and makes detection more efficient than working directly on full-color frames.

2. Face-based color modeling

After detecting the face, the project uses the face region as a reference area to estimate a skin-color distribution. This information is then used to search for other regions in the frame with similar characteristics.

3. Hand localization

The face region is excluded from the probability map so that the algorithm focuses on locating the hands instead of repeatedly identifying the face.

4. Data collection

The system captures hand images at user-defined intervals and stores them in multiple sizes, including 16×16 and 224×224, for later processing and training.

5. Dataset creation

Three datasets were created to compare how class balance and variability affect model performance:

Dataset 1: balanced classes with high variability
Dataset 2: unbalanced classes (50 / 100 / 150 samples) with high variability
Dataset 3: balanced classes where one class has low variability

6. Model training

Three MLP models were trained and evaluated on the datasets to compare their behavior under different data conditions.

Results

Model 1

Dataset	Train/Test Split	Validation Loss	Validation Accuracy
Dataset 1	210 / 90	1.4553	0.6556
Dataset 2	244 / 106	0.9063	0.8302
Dataset 3	210 / 90	0.6691	0.8444

Observation: Model 1 performs best on Datasets 2 and 3.

Model 2

Dataset	Train/Test Split	Validation Loss	Validation Accuracy
Dataset 1	210 / 90	1.7095	0.7667
Dataset 2	244 / 106	0.9224	0.8396
Dataset 3	210 / 90	1.2521	0.7556

Observation: Model 2 performs best on Dataset 2, likely benefiting from the dominant class distribution.

Model 3

Dataset	Train/Test Split	Validation Loss	Validation Accuracy
Dataset 1	210 / 90	1.2044	0.7889
Dataset 2	244 / 106	1.2693	0.7642
Dataset 3	210 / 90	1.8945	0.7000

Observation: Model 3 performs best on Datasets 1 and 2.

Test phase

For the live test phase, the project uses Model 1 for prediction. The system:

detects the hand in the camera frame,
generates a grayscale probability image,
reshapes the processed image for model input,
loads the trained model,
predicts the performed letter,
overlays the prediction on the video stream.

Technologies used

Python
OpenCV
NumPy
Matplotlib
TensorFlow / Keras
Google Colab

Notes on reproducibility

This project was originally developed in Google Colab and includes Colab-specific components such as:

camera capture through browser-side JavaScript,
Google Drive mounting,
Colab utility imports.

Because of this, the notebook is best understood as a documented academic project and prototype rather than a packaged, fully reproducible local application.

The full image dataset is stored externally on Google Drive rather than in this repository.

Limitations

The implementation is tightly coupled to the Google Colab environment.
Only three gesture classes are considered: M, N, and W.
The dataset is relatively small and tailored to the project experiment.
The repository is focused on demonstrating the pipeline and results rather than production deployment.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Hand Gesture Recognition

Overview

Main components

Repository structure

Key files

Method

1. Face detection

2. Face-based color modeling

3. Hand localization

4. Data collection

5. Dataset creation

6. Model training

Results

Model 1

Model 2

Model 3

Test phase

Technologies used

Notes on reproducibility

Limitations

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
models		models
notebooks		notebooks
results		results
.gitignore		.gitignore
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

Hand Gesture Recognition

Overview

Main components

Repository structure

Key files

Method

1. Face detection

2. Face-based color modeling

3. Hand localization

4. Data collection

5. Dataset creation

6. Model training

Results

Model 1

Model 2

Model 3

Test phase

Technologies used

Notes on reproducibility

Limitations

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages