This project implements a hybrid image classification system for Toyota car models, combining deep feature extraction with machine learning classification. The project was completed as part of the Neural Networks and Deep Learning course homework.
The aim of this project is to classify 10 different Toyota car models using a combination of:
- VGG16: pre-trained CNN used as a deep feature extractor.
- SVM: applied as a classifier on the extracted features.
Additionally, comparative experiments are conducted with AlexNet, VGG16 alone, and a custom CNN. The results of all models are analyzed with standard performance metrics and confusion matrices.
This hybrid approach is designed to improve accuracy while reducing computational cost compared to training full deep models from scratch.
- Downloaded Toyota car dataset.
- Selected 10 classes (car models) for classification.
- Converted categorical labels into numeric values.
- Resized all images to 224×224 (per article’s recommendation).
- Normalized pixel values to
[0, 1]. - Handled class imbalance by applying a chosen balancing method (with justification).
- Applied data augmentation to improve model generalization.
- Split dataset into 80% training and 20% testing, reporting sizes of each set.
- Loaded pre-trained VGG16 (without fully connected layers).
- Extracted features from the last convolutional layer and flattened them into vectors.
- Repeated the same process with AlexNet.
- Stored extracted features for use in classification.
Implemented and evaluated four models:
- VGG16 (fine-tuned).
- VGG16 + SVM (hybrid model).
- AlexNet (feature extraction + classifier).
- Custom CNN (as described in the reference article).
For each model:
- Trained using extracted features (or CNN directly).
- Evaluated on test set with metrics: Accuracy, Precision, Recall, F1-score, ROC-AUC.
- Plotted confusion matrix and explained how to interpret it.
- Collected results of all four models into a comparison table.
- Plotted bar/line charts for easier visualization (similar to the paper’s figures).
- Identified which classes were classified best/worst for each model.
- Discussed trade-offs between models in terms of accuracy, complexity, and generalization.
-
Highlighted main differences between VGGNet and AlexNet (architecture and depth).
-
Analyzed confusion matrices to identify frequent misclassifications.
-
Discussed why VGG16 + SVM outperformed other approaches.
-
Suggested methods for further improvement:
- Using larger or more balanced datasets.
- Advanced data augmentation techniques.
- Transfer learning with more modern architectures (ResNet, EfficientNet, etc.).
- Ensemble methods combining multiple classifiers.
- VGG16 + SVM achieved the best classification performance across most classes.
- AlexNet was faster but less accurate.
- Custom CNN offered reasonable performance but required careful tuning.
- Confusion matrices showed some classes were consistently harder to classify, often due to visual similarity.
- Combining deep features with classical ML classifiers (like SVM) can achieve strong results with limited data.
- Handling imbalanced datasets is crucial for fair evaluation.
- Confusion matrices are an excellent tool for analyzing per-class performance.
- Hybrid models can be more efficient than end-to-end CNNs in certain real-world scenarios (like limited data problems).