A real-time web application for interactive sketch classification and generation. Built with a React frontend and a Flask backend, the system demonstrates end-to-end pipelines for recognizing hand-drawn sketches and generating class-conditioned doodles using deep learning models.
SketchCoder combines two core tasks:
- Classification: Users can draw on a canvas, and the trained CNN model predicts the object class among 10 categories.
- Generation: Users can select a category (e.g., bus, guitar) and the generative RNN will produce a novel sketch in that class.
The project demonstrates a full ML lifecycle: dataset preparation, preprocessing, model training , backend APIs, and an interactive frontend.

We use the Google Quick, Draw! dataset, a large-scale crowdsourced collection of millions of doodles across hundreds of categories. Each doodle is stored in NDJSON format (newline-delimited JSON), where each line corresponds to one drawing.
- Structure: Each record contains metadata (
key_id,word/category label,countrycode,timestamp, recognition flag) and the actual drawing strokes. - Drawing representation: The
drawingfield is an array of stroke sequences, where each stroke is a list of x-coordinates, y-coordinates, and pen states. This makes it suitable for both raster classification and sequence-based generative modeling. - Classes used (10):
bat,bicycle,bus,cactus,clock,door,guitar,lightbulb,paintbrush,smileyface. - Train/Test split: ~80/20 per class.
Pipeline:
Classification Pipeline
Pipeline:
Generation Pipeline
- Accuracy: 96.0%
- Macro F1: 0.959
- Average Precision (micro): 0.991
Per-class metrics:
| Class | Precision | Recall | F1-score |
|---|---|---|---|
| bat | 0.939 | 0.939 | 0.939 |
| bicycle | 0.994 | 0.963 | 0.978 |
| bus | 0.978 | 1.000 | 0.989 |
| cactus | 0.984 | 0.892 | 0.936 |
| clock | 0.987 | 0.936 | 0.961 |
| door | 0.982 | 0.994 | 0.988 |
| guitar | 0.935 | 0.988 | 0.961 |
| lightbulb | 0.944 | 0.978 | 0.961 |
| paintbrush | 0.882 | 0.940 | 0.910 |
| smileyface | 0.981 | 0.963 | 0.972 |
-
Confusion Matrix –

The model shows strong class separation, with most predictions lying on the diagonal.
Misclassifications are minimal, and the lowest recall is for cactus (0.892), which overlaps slightly with other categories. -
Precision–Recall Curve (AP = 0.991) –

The PR curve stays near the top-right corner, confirming excellent performance across thresholds.
The micro-average precision of 0.991 indicates strong robustness against class imbalance. -
Per-class F1 Scores –

Most classes achieve F1-scores above 0.95, with bus and door nearly perfect.
The lowest-performing class is paintbrush (0.910), likely due to its visual similarity with guitar and lightbulb. -
Accuracy Curve –

Training accuracy converges close to 100%, while validation stabilizes around 96%,
suggesting good generalization with minimal overfitting. -
Loss Curve –

Training loss decreases smoothly and approaches near-zero, while validation loss stabilizes at a low level
with minor fluctuations, indicating consistent learning and no major divergence.
-
Latent Space (t-SNE of encoder μ) –

The embeddings form well-separated clusters for each class, showing that the encoder learns a structured latent space.
Visually similar objects (paintbrush and guitar) are closer but still separable, indicating good class-specific representation. -
KL Divergence by Class –

The KL term drops sharply in the first few epochs and stabilizes near zero, confirming that the variational latent space converges smoothly.
Some classes (door, clock) stabilize faster than others. -
Reconstruction Loss (MDN + Pen CE) –

The reconstruction loss decreases consistently across classes, with minor oscillations.
This shows that the decoder progressively learns to reproduce stroke patterns with higher fidelity. -
Total Loss by Class –

Overall loss follows a downward trend, reflecting combined improvements in both latent regularization (KL) and reconstruction.
The small per-class variations highlight that some sketches (bus, paintbrush) are harder to generate than others.
# Recreate environment
conda env create -f environment.yml
# Activate environment
conda activate sketchbot-env
# Run backend
cd ~/Desktop/SktechBot/backend
python app.py
# Deactivate when done
conda deactivatecd backend
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
pip install -r requirements.txt
python app.pycd frontend
npm install
npm run devAccess at: http://localhost:3000
- Extend to Transformer/attention-based generators for richer sketches.
- Add stroke-by-stroke playback animation in frontend.
- Confidence calibration for reliable rejection of uncertain predictions.
- Expand dataset to more categories.
- Deploy with Docker + cloud hosting for broader access.
- Professor Sarp Akcay, University College Dublin — for invaluable guidance and supervision throughout the course project.
- Google Quick, Draw! dataset — for providing large-scale sketch data.
- Research inspiration from SketchRNN (Ha & Eck, 2017).