Python implementation of deep neural network from scratch with a mathmatical approach.
- script containing the neural network class implementation
- script for reading the data the data and structring it
- dataset
- demo
- Intializing paramaters
- Forward propagation
- Cost function
- Backward propagation
- Training (gradient descent)
- notes
- Weight are intilaized with random values according to 'He intialization'.
- biases are intialized as zeros.
Forward propagation is mainly broken into two steps:
-
linear forward (weighted sum input):
calculating z = w.x + b -
activation:
pluging z into the activation function sigmoid or relu ...
A = g(z)
-
For an L-layer model we commonly use the relu activation function for hidden layers neurons
and sigmoid activation function for output layer in case of binary classification
as it maps values to propablties between 0 and 1
therefore:
propabilty > 0.5 = 1 , and propabilty < 0.5 = 0 -
In case of multi-class classification we use a Softmax activation function in output layer
which isn't implemented here and i will implement later.
Here is a vectorized implementation of forward propagation:
Since we are doing binary classification we use the logistic cost function
Back propagation is the step that allows calculating gradients for gradient descent (training the neural network).
In back propagation we follow the reversed path of the neural network calculating gradients for weights and biases to update them
during gradient descent (training).
This is a very usefull article explaining the math behind back propagation, essentialy we use the chain rule from calculas to calculate the derivative of loss w.r.t weights and biases.
The reason we use the chain rule maybe not be very obvious for some people so we can break it down this way:
- let the cost function be: L(g)
- let the activation function be: g = g(z)
- let the weighted sum be: z = z(w,b)
So the loss is L(g(z(w,b)))
Now ,how to get the gradient of this function w.r.t w,b ?
we can do this using the chain rule from calculus
we simply break down the equation into partial derivatives of loss w.r.t w,b
- kick start back prop. by calculating the drivative of Loss w.r.t last layer activation
- since the last layer(output layer) is unique (has diffrent activation from other layers), calculate the derivative of the loss w.r.t weights and biases
Here we use the sigmoid activation function so we use its derivative.
- loop over all of the rest of the layers calculating gradients and storing them.
Here we use the relu activation function so we use its derivative.
Here is the vectorized implementation of back propagation:
Here we use the simplest optmization algorithm which is batch gradient descent.
- repeat using all data points:
1. calculate gradient of w,b w.r.t Loss
2. update w,b
- implement forward propagation.
- calculate cost - for debuging not really needed.
- implement back propagation.
- update paramters using gradients.
The model assumes the input features are in the shape of (nx,m)
where col is a training example containing n-features (n-features,m-samples)
- so, if you want to try the model on your own dataset, you must provide training data with the same structure
to avoid running into errors.
- Also, it is better to normalize the training / testing features as neural netowrks tend to work the best with
normalized features to avoid exploding/vanshing weights or gradients
- Any ways the class is ment for learning purpose as it doesnt implement any regularization techniques like dropout
regularization
- So, it is generaly better to use the data provided here.
- I'll soon add implementations for regularization and adam optmization with mini-batch /stocastic GD and will update the readme accordingly.