Skip to content

murodovdev/digit-vision

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DigitVision 🧠✍️

Python TensorFlow Flask Accuracy License

A full-stack AI web application that recognizes handwritten digits (0–9) in real time.

Built from scratch — including model training, REST API, and interactive frontend.

Demo · How it works · Quick Start · Experiments


What it does

Draw any digit on the canvas. The AI predicts it instantly — showing the predicted digit, confidence score, and a full probability distribution across all 10 classes.

No external AI APIs. The model is trained and served entirely locally.


Demo

Draw → Predict → See confidence scores across all 10 digits

DigitVision demo

How to run it locally: follow the Quick Start below, open frontend/index.html, draw any digit and click Predict. Results appear instantly with a full probability bar chart.


How it works

User draws on canvas
      ↓
canvas.toDataURL() → base64 PNG
      ↓
POST /api/predict  (Flask)
      ↓
Pillow: resize to 28×28, normalize to [0,1]
      ↓
CNN model: 10 softmax probabilities
      ↓
JSON response → UI update

Model Architecture

Input (28×28×1)
  → Conv2D(32, 3×3, relu) + MaxPool
  → Conv2D(64, 3×3, relu) + MaxPool
  → Flatten
  → Dense(128, relu) + Dropout(0.5)
  → Dense(10, softmax)

Trained on MNIST (60,000 images). Final test accuracy: 99.27% (73 wrong out of 10,000).


Project Structure

DigitVision/
├── frontend/
│   ├── index.html          # Main page
│   ├── css/style.css       # Dark theme UI
│   └── js/
│       ├── app.js          # Entry point — event wiring
│       ├── canvas.js       # HTML5 Canvas drawing logic
│       ├── predict.js      # fetch() → Flask API
│       └── ui.js           # DOM updates, probability bars
├── backend/
│   ├── app.py              # Flask server + /api/predict route
│   ├── predict.py          # Inference pipeline
│   ├── preprocess.py       # base64 PNG → 28×28 float32 tensor
│   ├── model_loader.py     # Singleton model cache
│   └── requirements.txt
├── model/
│   ├── cnn.py              # CNN architecture definition
│   ├── train.py            # Interactive training script
│   ├── evaluate.py         # Confusion matrix + sample predictions
│   ├── visualize.py        # Training curve plots
│   └── checkpoints/        # Saved model weights (.keras)
└── docs/
    └── experiment_log.md   # Hyperparameter experiment results

Quick Start

# 1. Clone
git clone https://github.com/murodovdev/digit-vision.git
cd digit-vision

# 2. Create virtual environment
python -m venv venv
venv\Scripts\activate        # Windows
# source venv/bin/activate   # macOS/Linux

# 3. Install dependencies
pip install -r backend/requirements.txt

# 4. Train the model (interactive — you set the hyperparameters)
cd model
python train.py

# 5. Start the backend
cd ../backend
python app.py

# 6. Open frontend/index.html in your browser

Experiments

Part of this project was running controlled experiments to understand how each hyperparameter affects training. All results are logged in docs/experiment_log.md.

Run Learning Rate Batch Size Dropout Accuracy Key Finding
Baseline 0.001 64 0.5 99.27% Best balance of speed and accuracy. Reproducible with SEED=42
High LR 0.05 64 0.5 10.28% Loss stuck at ln(10) — pure random guessing
No dropout 0.001 64 0.0 98.94% Overfitting: train loss → 0.008, val plateaued
Tiny batch 0.001 8 0.5 99.28% Same accuracy, 3.7× slower (621s vs 169s)
+ Augmentation 0.001 64 0.4 98.97% Augmentation hurt — MNIST is already large enough
+ BatchNorm 0.0005 64 0.4 99.18% Architecture change broke old hyperparameters

Tech Stack

Layer Technology
Frontend HTML5 Canvas, Vanilla JS (no frameworks)
Backend Python, Flask, flask-cors
AI Model TensorFlow 2.16, Keras
Dataset MNIST (70,000 handwritten digit images)
Image processing Pillow (resize, normalize)
Visualization Matplotlib

Key Concepts Implemented

  • Convolutional Neural Network (CNN) with 2 conv blocks
  • Dropout regularization to prevent overfitting
  • Early stopping with weight restoration to best epoch
  • Preprocessing pipeline — canvas PNG → normalized 28×28 tensor
  • REST API — browser communicates with model over HTTP/JSON
  • Singleton model loader — model loaded once, cached for all requests
  • Hyperparameter experiments — documented with hypothesis and results

License

MIT — feel free to use, modify, and build on this project.

About

Learning project: built a digit recognizer end-to-end — trained a CNN on MNIST, served it with Flask, and connected it to an HTML5 Canvas UI. 99.27% test accuracy.

Topics

Resources

Stars

Watchers

Forks

Contributors