A lightweight, PyTorch-based implementation of Fast Neural Style Transfer, engineered to perform real-time artistic stylization on video streams and webcam feeds using resource-constrained hardware.
This project implements a feed-forward Convolutional Neural Network (CNN) capable of applying artistic styles to images and video with low latency. Based on Johnson et al. (2016), the model separates training and inference so stylization runs in real time (suitable for video/webcam).
Key ideas:
- Feed-forward TransformerNet (encoder → residual blocks → decoder)
- Instance Normalization for improved stylization
- Optimized for resource-constrained hardware (CPU / MPS / CUDA)
- ⚡ Real-time inference (30+ FPS on capable hardware)
- 🏗 Encoder → Bottleneck (residual blocks) → Decoder architecture
- 🎨 Instance Normalization for better style transfer convergence
- 💻 Hardware agnostic: CUDA (NVIDIA), MPS (Apple Silicon), CPU
This implementation is a proof-of-concept tailored for machines with limited VRAM and memory. Engineering trade-offs:
| Constraint | Strategy |
|---|---|
| VRAM limits | Batch size restricted to 2 to avoid OOM during training |
| Memory bandwidth | 256×256 training resolution to keep tensor footprint small |
| Storage / Data | Curated dataset subset instead of large COCO dataset |
| Compute time | 2 training epochs for quick validation (extend for quality) |
Result: The model generalizes style textures and preserves content structure under these constraints.
-
Generator (TransformerNet)
- Downsampling via strided conv layers
- Bottleneck: 5 Residual Blocks
- Upsampling with
Upsample+Conv2dto avoid checkerboard artifacts - Instance Normalization throughout
-
Loss Network (VGG16)
- Pretrained VGG16 (frozen) used for perceptual loss
- Content loss computed at
relu2_2 - Style loss computed via Gram matrices at
relu1_2,relu2_2,relu3_3,relu4_3
- Clone the repository
git clone https://github.com/yourusername/RealTimeTransfer.git
cd RealTimeTransfer- Install dependencies
pip install -r requirements.txt- Prepare data
- Content images: put JPG/PNG in
data/content/ - Style images: put references in
data/style/
Training:
python train.py \
--content-dir ./data/content \
--style-image ./data/style/style.jpg \
--batch-size 2 \
--epochs 2 \
--save-model-dir ./checkpointsNote: Script auto-detects MPS on Apple Silicon.
Webcam inference:
python inference.py --model checkpoints/model.pth --webcam --image-size 480Process a video:
python inference.py \
--model checkpoints/model.pth \
--input my_video.mp4 \
--output stylized_video.mp4Content images sourced from Unsplash collections (Kaggle). Verify licenses before redistribution.
- Increase batch size for stable training
- Train on full COCO for robust generalization
- Add temporal consistency (optical flow) to reduce flicker
- Export to ONNX for mobile/web deployment