A computer vision system that classifies 75,000+ food images across 101 categories using Convolutional Neural Networks and Vision Transformers.
- 77% top-1 accuracy on 101 food categories
- Processes 75,000+ training images
- Transfer learning with EfficientNetB0
- Systematic data augmentation (rotation, scaling, color jittering)
- Comprehensive evaluation pipeline with per-class metrics
- Framework: PyTorch, TensorFlow
- Architecture: EfficientNetB0 (Vision Transformer)
- Libraries: torchvision, scikit-learn, pandas, NumPy, matplotlib
- Techniques: Transfer learning, data augmentation, regularization (dropout, weight decay)
| Metric | Score |
|---|---|
| Top-1 Accuracy | 77% |
| Training Images | 75,000+ |
| Categories | 101 |
| Validation Images | 25,000 |
- Reduced overfitting by 23% through regularization
- 15% improvement over baseline CNN
- Per-class precision/recall analysis with confusion matrix
- Challenge: High visual diversity across categories
- Solution: Systematic augmentation + transfer learning
- Challenge: Similar food types (e.g., pasta varieties)
- Solution: Fine-grained feature extraction focus
- Transfer learning dramatically improves performance on limited data
- Data augmentation must preserve category-defining features
- Per-class analysis reveals model strengths/weaknesses better than overall accuracy
Romeo Nickel - LinkedIn - rjnickel@usc.edu