HandTalk is a real-time American Sign Language (ASL) detection system integrated into video calls. Our project aims to facilitate communication for individuals using ASL by recognizing hand gestures and converting them into text during live video calls. The system utilizes a pre-trained MobileNet model, which we fine-tuned on over 30,000 images by unfreezing 10 layers to enhance ASL recognition accuracy.
- Real-time ASL Detection in Video Calls: Users can communicate using sign language, and our model translates it into text messages in real-time.
- Self-Testing & ASL Learning: Users can practice and learn ASL through an interactive section on our website.
- Frontend: React
- Backend: Node.js, WebRTC for video communication , Mediapipe for Hand detection
- Machine Learning: Python, OpenCV for model prediction
- Signaling Protocol: WebSockets
- Model Training: Custom model trained using MobileNet architecture
We have implemented several APIs in our Python backend:
/video_feed- Provides video frames for ASL prediction./process- Uses MediaPipe to detect hands and extract hand frames for analysis./prediction- Receives hand frames, predicts the ASL label, and transmits it via WebSockets to the recipient.
HandTalk/
│── client/ # Contains React components
│── server/
│ ├── best_model.keras # Trained model for ASL prediction
│ ├── index.js # Node.js backend for WebRTC communication
│ ├── server.py # Python backend for video processing & prediction
| ├── ser.py #python backend for self testing
│── ModelTrain.ipynb # Model training script
Follow these steps to set up and run the project:
git clone https://github.com/Yashgabani845/GestureGenius.git
cd GestureGeniuscd client
npm install # Install all dependencies
npm start # Start the React frontendcd server
npm install # Install backend dependencies
node index.js # Start the Node.js server for WebRTC communicationEnsure you have Python installed and install the required dependencies:
pip install flask opencv-python numpy tensorflow mediapipe flask-corsThen, start the Python backend:
python server.py
python ser.pyNow, access the project at http://localhost:3000.
Our model was trained using the MobileNet architecture, which is optimized for low computational cost due to its depthwise and pointwise convolutions. The model was fine-tuned in approximately 60 minutes and achieved high accuracy in recognizing ASL gestures.
- Lightweight and efficient
- Suitable for real-time inference
- Optimized for mobile and web applications
Check out our demo video on YouTube: HandTalk Demo


