Skip to content

Python implementation of deep neural network from scratch with a mathmatical approach.

Notifications You must be signed in to change notification settings

Shehab-Mahmoud/DeepNeuralNetwork

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

55 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Deep Neural Network

Python implementation of deep neural network from scratch with a mathmatical approach.

Table of contents:

  1. Intializing paramaters
  2. Forward propagation
  3. Cost function
  4. Backward propagation
  5. Training (gradient descent)
  6. notes

1. intialize paramaters

  1. Weight are intilaized with random values according to 'He intialization'.
  2. biases are intialized as zeros.

2.Forward propagation

Forward propagation is mainly broken into two steps:

  1. linear forward (weighted sum input):
     calculating z = w.x + b

  2. activation:
     pluging z into the activation function sigmoid or relu ...
     A = g(z)

  • For an L-layer model we commonly use the relu activation function for hidden layers neurons
    and sigmoid activation function for output layer in case of binary classification
    as it maps values to propablties between 0 and 1
    therefore:
    propabilty > 0.5 = 1 , and propabilty < 0.5 = 0

  • In case of multi-class classification we use a Softmax activation function in output layer
    which isn't implemented here and i will implement later.

Here is a vectorized implementation of forward propagation:

3.Cost Function

Since we are doing binary classification we use the logistic cost function

4.Back propagation

Back propagation is the step that allows calculating gradients for gradient descent (training the neural network).
In back propagation we follow the reversed path of the neural network calculating gradients for weights and biases to update them during gradient descent (training).

This is a very usefull article explaining the math behind back propagation, essentialy we use the chain rule from calculas to calculate the derivative of loss w.r.t weights and biases.

The reason we use the chain rule maybe not be very obvious for some people so we can break it down this way:

  • let the cost function be: L(g)
  • let the activation function be: g = g(z)
  • let the weighted sum be: z = z(w,b)

So the loss is L(g(z(w,b)))
Now ,how to get the gradient of this function w.r.t w,b ?

we can do this using the chain rule from calculus

we simply break down the equation into partial derivatives of loss w.r.t w,b

The steps to implementing back propagation :

  1. kick start back prop. by calculating the drivative of Loss w.r.t last layer activation
  2. since the last layer(output layer) is unique (has diffrent activation from other layers), calculate the derivative of the loss w.r.t weights and biases

Here we use the sigmoid activation function so we use its derivative.

  1. loop over all of the rest of the layers calculating gradients and storing them.

Here we use the relu activation function so we use its derivative.

Here is the vectorized implementation of back propagation:

5.Training the network (gradient descent)

Here we use the simplest optmization algorithm which is batch gradient descent.

  • repeat using all data points:
      1. calculate gradient of w,b w.r.t Loss
      2. update w,b

steps implementing gradient descent :

  1. implement forward propagation.
  2. calculate cost - for debuging not really needed.
  3. implement back propagation.
  4. update paramters using gradients.

vectorized implementation of gradient descent

notes:

The model assumes the input features are in the shape of (nx,m)
where col is a training example containing n-features (n-features,m-samples)

  • so, if you want to try the model on your own dataset, you must provide training data with the same structure to avoid running into errors.

  • Also, it is better to normalize the training / testing features as neural netowrks tend to work the best with normalized features to avoid exploding/vanshing weights or gradients

  • Any ways the class is ment for learning purpose as it doesnt implement any regularization techniques like dropout regularization

  • So, it is generaly better to use the data provided here.

  • I'll soon add implementations for regularization and adam optmization with mini-batch /stocastic GD and will update the readme accordingly.

About

Python implementation of deep neural network from scratch with a mathmatical approach.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published