This exercise is to employ different techniques to train and evaluate different machine learning models to predict credit risk with unbalanced classes. Algorithms used in the analysis:
- the oversampling RandomOverSampler and SMOTE algorithms.
- the undersampling ClusterCentroids algorithm to resample the data.
- the combinatorial SMOTEENN algorithm to resample the training data.
- BalancedRandomForestClassifier and EasyEnsembleClassifier to reduce bias.
We use balanced accuracy score, confusion matrix and imbalanced classification report to compare results.
The balanced accuracy score is 62%.
The high_risk precision is about 1% only with 60% sensitivity which makes a F1 of 2% only.
Due to the imbalanced number of the low_risk population, its precision is almost 100% with a sensitivity of 65%.
The balanced accuracy score is 65%.
The high_risk precision is about 1% only with 64% sensitivity which makes a F1 of 2% only.
Due to the imbalanced number of the low_risk population, its precision is almost 100% with a sensitivity of 66%.
Very similiar result to the previous one.
The balanced accuracy score is down to 52%.
The high_risk precision is about 1% only with 59% sensitivity which makes a F1 of 1% only.
Due to the imbalanced number of the low_risk population, its precision is almost 100% with a sensitivity of 46%.
The balanced accuracy score is 62%.
The high_risk precision is about 1% only with 69% sensitivity which makes a F1 of 2%.
Due to the imbalanced number of the low_risk population, its precision is almost 100% with a sensitivity of 54%.
The balanced accuracy score is greatly improved to 79%.
The high_risk precision is about 4% only with 67% sensitivity which makes a F1 of 7%.
Due to a lower number of false positives, its precision is almost 100% with a sensitivity of 91%.
The balanced accuracy score is very high at 93%.
The high_risk precision is about 7% only with 91% sensitivity which makes a F1 of 14%.
Due to a lower number of false positives, its precision is almost 100% with a sensitivity of 94%.
- All the models we used to predict the credit risk analysis show weak precision in determining if a credit risk is high.
- The Ensemble models show great improvment specially on the sensitivity of the high risk credits.
- Even though the EasyEnsembleClassifier model detects almost all high risk credit. On another hand, with a low precision, a lot of low risk credits are still falsely detected as high risk. It may lead to the bank losing its business opportunities.
- Maybe there are models the bank can use to predict credit risk other than those above.