The Python module composite
and layer
implement various convolutional layer using pure numpy. However, the efficiency
of performing convolution through iteration in Python is extremely poor. They are probably more useful if I had
implemented them in C or using gonum in Golang. Anyways, they are good reference for back-propagation algorithms in
various layers.
Dataset folder has been git ignored. However there is a bash script for obtaining the CIFAR 10 data. There are two datasets I am using in this project; CIFAR 10 for general practicing and Kaggle's dog-vs-cat for prototyping my pet recognition project.
Jupyter notebooks are primarily for notes and mathematic derivations.
These are the actual models implemented using tensorflow.
VALID
means no padding, only drops the right most column or bottom most rows.
inputs: 1 2 3 4 5 6 7 8 9 10 11 (12 13)
|________________| dropped
|_________________|
SAME
tries to pad evenly left and right, but if the amount of columns to be added is odd, it will add the extra column to the right.
pad| |pad
inputs: 0 |1 2 3 4 5 6 7 8 9 10 11 12 13|0 0
|________________|
|_________________|
|________________|
AlexNet was implemented using
- Input:
(N, 227, 227, 3)
- Convolution:
(N, 55, 55, 96)
using 96 11x11 filters withstride=4
andpadding=0
- Max Pooling:
(N, 27, 27, 96)
using 3x3 filter withstride=2
- Normalization:
(N, 27, 27, 96)
using batch normalization, normalize across channels - Convolution:
(N, 27, 27, 256)
using 256 5x5 filters withstride=1
andpadding=2
- Max Pooling:
(N, 13, 13, 256)
using 3x3 filter withstride=2
- Normalization:
(N, 13, 13, 256)
using batch normalization, normalize across channels - Convolution:
(N, 13, 13, 384)
using 384 3x3 filters withstride=1
andpadding=1
- Convolution:
(N, 13, 13, 384)
using 384 3x3 filters withstride=1
andpadding=1
- Convolution:
(N, 13, 13, 256)
using 256 3x3 filters withstride=1
andpadding=1
- Max Pooling:
(N, 6, 6, 256)
using 3x3 filter withstride=2
- Fully Connected:
(N, 4096)
- Fully Connected:
(N, 4096)
- Output:
(N, 1000)
which is the class score
It used heavy data augmentation and dropout rate of 0.5, which is not included above. The batch size was 128 and used stochastic gradient descent with momentum 0.9. Learning rate was 1e-2 and reduced by factor of 10 manually when validation accuracy plateaus. L2 weight decay was 5e-4.