This is probably one of the projects I have cared about the most. It fuses my convolutional network from the (currently shelved) diffusion-model experiments with the multilayer perceptron from my earlier neural-network work, creating a playground for building simple computer-vision ideas.
- Python 3.9 or higher
- Webcam
- (Optional) CUDA-capable GPU for acceleration
- Captures webcam frames and trains a model on the fly with user-labeled targets.
- Chains a NumPy/CuPy-style CNN front-end to a hand-built MLP classifier.
- Lets me prototype different spatial resolutions, kernel counts, and output spaces without regenerating scaffolding.
- Live training works: the Train screen streams the camera feed, you pick the “correct output,” hit start, and gradients flow every frame.
- Model creation works: tweak kernel sizes, depth, and class counts, then jump straight into training mode.
- Loading/saving is in progress: the UI hooks exist, but serialization still needs to be finished.
python ComputerVisionPlayground.py- Choose Create New, dial in the conv/MLP settings, and submit.
- click Submit, aim the webcam, select the class label, and toggle live training.
To use CuPy for GPU acceleration:
# For CUDA 12.x
pip install cupy-cuda12x
# Then edit ConvolutionalNeuralNetwork_numpy.py:
# Change: import numpy as cp
# To: import cupy as cp- PySide6: GUI framework
- OpenCV: Webcam capture and image processing
- NumPy: Numerical computing for neural networks
- CuPy (optional): GPU-accelerated computing
-
Change the trining to sample videos based on the users input, and train X epochs on those videos. Move live loop trining to loading. (Done)
-
Finish the save/load path so experiments aren’t strictly in-memory. (Done)
-
Add better telemetry (loss plots, per-class confidence readouts) to understand what the live loop is learning.
-
Explore lightweight data augmentation for more stable real-time training.