The purpose of this exercise is to develop, with the use of neural networks, a binary classifier that is capable of determining whether or not a particular application would be successful in obtaining funding from a charitable organization known as ##Alphabet Soup##.
We were given a CSV file by the business team at Alphabet Soup that included the names of more than 34,000 groups that have been successful in obtaining funding throughout the years. The dataset has a number of columns, each of which captures a different piece of metadata pertaining to one of the organizations.
The columns of the data set are:
EIN
andNAME
— Identification columnsAPPLICATION_TYPE
— Alphabet Soup application typeAFFILIATION
— Affiliated sector of industryCLASSIFICATION
— Government organization classificationUSE_CASE
— Use case for fundingORGANIZATION
— Organization typeSTATUS
— Active statusINCOME_AMT
— Income classificationSPECIAL_CONSIDERATIONS
— Special consideration for applicationASK_AMT
— Funding amount requestedIS_SUCCESSFUL
— Was the money used effectively
- What variable(s) are considered the target(s) for your model?
Target, T, is the correct or desired value for the response associated to one input, X. This value will be compared with the output (the response from the neural network), Y to guide the learning process involving the weight changes. The difference between the desired result (the target, T) and the actual output, Y, is the error. The objective of training the neural network is to minimize the error.
In our case, the objective is that the Neural Network be able to predict if an organization is going to be successful or not, using the funds received, so the IS_SUCCESSFUL
column contains the target variable. Target variables are also known as dependent variable and we are using this variable to train our model.
- What variable(s) are considered to be the features for your model?
Input values are defined as features for the model and are also referred to as independent variables. All the columns in the CSV except the target variable IS_SUCCESSFUL
and the ones we dropped — EIN
and NAME
are included in those variables.
- What variable(s) are neither targets nor features, and should be removed from the input data?
The columns EIN
and NAME
do not contain data that gives additional information to the model. They would just add noise to the problem and were therefore removed from the dataset using the drop
function from Pandas.
In the same way, variables with too many unique values would be removed. In our example, the column ASK_AMT
has 8747
unique values, so this variable should also be eliminated, or at least "binned" in order to reduced the number of variables that the model will have to deal with.
But, what is Binning?
Binning is a technique that accomplishes exactly what it sounds like. It will take a column with continuous numbers and place the numbers in “bins” or categories based on ranges that we determine. This will give us a new categorical variable feature.
* How many neurons, layers, and activation functions did you select for your neural network model, and why?
A good rule of thumb for a basic neural network is to have two to three times the amount of neurons in the hidden layer as the number of inputs. In the first run, the model had two hidden layers, the first layer had 80
neurons and the second layer had 30
neurons. These parameters were changed in subsequent runs, but they will be explained later on.
Other parameters used for the first run were the relu
activation function and the adam
optimizer. Adam (the name Adam is derived from adaptive moment estimation) is an optimization algorithm that can be used instead of the classical stochastic gradient descent procedure to update network weights iterative based in training data.
The binary crossentropy
was used as the loss function. Binary crossentropy is a loss function that is used in binary classification tasks. These are tasks that answer a question with only two choices (yes or no, A or B, 0 or 1, left or right). Several independent such questions can be answered at the same time
In the instructions for this cahllenge it is stated that "The accuracy for the solution is designed to be lower than 75 %", so the objective of the exercise is to optimize the Tensorflow model in order to achieve a target predictive accuracy higher than 75 %.
The code for the original run is in the file AlphabetSoupCharity.ipynb
There were four attempts to improve the model's accuracy. The first three attempts involved changing the activation function, and the fourth attempt involved changing the number of hiden layers and neurons.
The code for the first optimization run is in the file AlphabetSoupCharity - Optimized 1.ipynb
The code for the second optimization run is in the file AlphabetSoupCharity - Optimized 2.ipynb
The code for the third optimization run is in the file AlphabetSoupCharity - Optimized 3.ipynb
Results of the fourth optimization run - Using SIGMOID as the activation function and Adding one extra neuron layer.
The code for the fourth optimization run is in the file AlphabetSoupCharity - Optimized 4.ipynb
The following table shows a summary of the results obtained
Run # | Modification made | Loss and Accurary Obtained |
---|---|---|
0 | Original Run Algorithm | Loss: 0.5711 Accuracy: 0.7254 |
1 | Using TANH activation function | Loss: 0.5668 Accuracy: 0.7249 |
2 | Using SIGMOID activation function | Loss: 0.5648 Accuracy: 0.7255 |
3 | Using RELU activation function | Loss: 0.7064 Accuracy: 0.7247 |
4 | Using SIGMOID activation function and + 1 Extra Layer |
Loss: 0.5885 Accuracy: 0.7258 |
What is the relationship between the accuracy and the loss in deep learning?: There is no relationship between these two metrics.
Loss can be defined as the difference between the problem's true values and the values predicted by the model. The greater the loss, the greater the magnitude of the data errors.
The number of errors made on the data can be used to calculate accuracy.
That means:
-
A low accuracy and large loss indicates that there are numerous errors on a large amount of data.
-
A low accuracy but low loss indicates that there are minor mistakes on a large amount of data.
-
A high accuracy with low loss indicates few errors on a small set of data (best case scenario).
The results above show that it was not possible to exceed the level of 75 percent accuracy, even with the original run's settings. The loss results in the five cases presented are dismal. In each case, the loss function is greater than 50%. The RELU function performed the worst in this regard, with a loss value of 70.64 percent.
In terms of accuracy, the difference between runs was marginal. The difference between the lowest and highest accuracy is only 0.11 percent (72.58 % - 72.47 % = 0.11 %)
, indicating that the model was not improved by the changes made.
So, how could the model be improved?
There is no simple answer to this question, but here are some ideas to get started:
- Considering gathering more data.
- Testing additional activation functions like Leaky RELU, Parametric RELU, ELU, Softmax, Swish, GELU or SELU.
- Testing additional loss functions for binary classification like Hinge Loss or Squared Hinge Loss.
- Testing additional optimizer functions like Adadelta, Adagrad, RMSprop, SGD with Momentum or SGD.
- Increasing the number of hidden layers.
- Increasing the number of neurons per layer.
The Rise of Machine Learning, https://courses.bootcampspot.com/courses/1145/pages/19-dot-0-1-the-rise-of-machine-learning
Towards Data Science: Binning for Feature Engineering in Machine Learning, https://towardsdatascience.com/binning-for-feature-engineering-in-machine-learning-d3b3d76f364a
Machine Leraning Mastery: Gentle Introduction to the Adam Optimization Algorithm for Deep Learning, https://machinelearningmastery.com/adam-optimization-algorithm-for-deep-learning/
Peltarion: Binary crossentropy, https://peltarion.com/knowledge-center/documentation/modeling-view/build-an-ai-model/loss-functions/binary-crossentropy
Machine Learning Mastery: Loss and Loss Functions for Training Deep Learning Neural Networks, https://machinelearningmastery.com/loss-and-loss-functions-for-training-deep-learning-neural-networks/
Data Science Stack Exchange: What is the relationship between the accuracy and the loss in deep learning?, https://datascience.stackexchange.com/questions/42599/what-is-the-relationship-between-the-accuracy-and-the-loss-in-deep-learning
Towards Data Science: Activation Functions in Neural Networks, https://towardsdatascience.com/activation-functions-neural-networks-1cbd9f8d91d6
V7 Labs: Activation Functions in Neural Networks [12 Types & Use Cases], https://www.v7labs.com/blog/neural-networks-activation-functions
Machine Learning Mastery: How to Choose Loss Functions When Training Deep Learning Neural Networks, https://machinelearningmastery.com/how-to-choose-loss-functions-when-training-deep-learning-neural-networks/
Analytics Vidhya: A Comprehensive Guide on Deep Learning Optimizers, https://www.analyticsvidhya.com/blog/2021/10/a-comprehensive-guide-on-deep-learning-optimizers/#:~:text=An%20optimizer%20is%20a%20function,loss%20and%20improve%20the%20accuracy.