This project's main purpose is to mimic Google's Image Search Engine, working with a large enough dataset we will try to achieve the best results we can.
First you should load the dataset from Darknet : Here
Second, you should put in the root of your project following this structure :
. project
+-- your_notebook.ipynb
+-- dataset
| +-- train ==> this contains your 50.000 training images
| +-- test ==> this contains your 10.000 training images
| +-- labels.txt ==> this contains your classes
The entire model consists of 14 layers in total. In addition to layers below lists what techniques are applied to build the model.
- Convolution with 64 different filters in size of (3x3)
- Max Pooling by 2
- ReLU activation function
- Batch Normalization
- Convolution with 128 different filters in size of (3x3)
- Max Pooling by 2
- ReLU activation function
- Batch Normalization
- Convolution with 256 different filters in size of (3x3)
- Max Pooling by 2
- ReLU activation function
- Batch Normalization
- Convolution with 512 different filters in size of (3x3)
- Max Pooling by 2
- ReLU activation function
- Batch Normalization
- Flattening the 3-D output of the last convolutional operations.
- Fully Connected Layer with 128 units
- Dropout
- Batch Normalization
- Fully Connected Layer with 256 units
- Dropout
- Batch Normalization
- Fully Connected Layer with 512 units
- Dropout
- Batch Normalization
- Fully Connected Layer with 1024 units
- Dropout
- Batch Normalization
- Fully Connected Layer with 10 units (number of image classes)