Caltech-256 is a challenging set of 257 (including the last category of clutter) object categories containing a total of only 30607 images. Furthermore this dataset is imbalanced as seen in the plot below. In this exercise I utilized different Neural Network architectures and compare their performance. This project took me exactly 1 month because of the scale of the problem and the training and tweaking multiple CNN models that most took overnight to train.
| Model | Accuracy(Test-set) |
|---|---|
| Fully Connected | 14% |
| VGG | 44% |
| Inception | 45% |
| Resnet | 48% |
| VGG (Transfer-Learning) | 57% |
| Inception (Transfer-Learning) | 63% |
| Inception Resnet (Transfer-Learning) | 71% |
| Xception (Transfer-Learning) | 66% |
As seen in the table above using a Convolutional Neural Network is a big leap in accuracy from Fully Connected Neural Networks and Transfer learning of is a significantly better than training a CNN from scratch. Furthermore, Transfer Learning is the fastest to train because you are only training a fraction of the network and in addition, it requires the least data.

