This is a MobileNetV2-based cars recognition (Stanford Cars-196 Dataset classification)
There are several important criteria of work:
-
Accuracy
-
Speed & lightweight
Thus, the classifier based on MobileNetV2 was chosen because of its small number of parameters, small size and the ability to recognize almost real-time based on mobile devices.
This repository consists of two Google Colaboratory Notebooks:
- Training, Finetuning, MobileNetV2 (Preprocessing & training) + Prediction (Evaluation)
- MobileNetV2 Grad-CAM (class activation visualization)
The Stanford Cars-196 dataset consists of 16185 images of cars of 196 classes:
-
Train folder: 8144 images
-
Test folder: 8041 images
Despite the large size, the number of images for each class is relatively small (avg 41.25 images per class), and since the cars are visually very similar, this makes it difficult to differentiate brands and models.
In my work I've used PyTorch-implemented (torchvision.models) pre-trained MobileNetV2 with transfer learning.
All layers was fine-tuned and the last layer was replaced (changing classifier output-size)
Also, I've used:
- Cross-Entropy loss
- Adam optimizer (with L2-penalty)
The model has been trained for 10 epochs.
Final score: 0.8486 Accuracy
Finally, Grad-CAM (Gradient-weighted Class Activation Heatmap) was used for MobileNetV2 class activation exploration - overlay visualization.