I am asked to resample the credit card data since it is not balanced. First, I start to split the data and perform oversampling with RandomOverSampler and SMOTE method, and I undersample with ClusterCentroids algorithm. Then, I utilize the SMOTEENN method to oversample and undersample the data. Finally, I use ensemble models such as EasyEnsembleClassifier and BalancedRandomForestClassifier to predict the credit card fraud risks.
-
The balanced accuracy score for RandomOverSampler oversampling method is 0.643. The precision and recall are 1 and 0.59, respectively, for non-fraudulent credit cards.
-
The balanced accuaracy score for SMOTE oversampling method is 0.662. The presicion and recall is 1 and 0.69 respectively for non-fraudaulent credit cards.
-
The balanced accuaracy score for ClusterCentroids undersampling method is 0.544. The presicion and recall is 1 and 0.4 respectively for non-fraudaulent credit cards.
-
The balanced accuracy score for SMOTEENN oversampling and undersampling method is 0.674. The precision and recall is 1 and 0.59 respectively for non-fraudulent credit cards.
-
The balanced accuaracy score for BalancedRandomForestClassifier ensemble method is 0.788. The presicion and recall is 1 and 0.87 respectively for non-fraudaulent credit cards.
-
The balanced accuaracy score for EasyEnsembleClassifier ensemble method is 0.915. The presicion and recall is 1 and 0.9 respectively for non-fraudaulent credit cards.
All the models had a precision of 1 for non-fraudulent cards, and all of them had 0.01 precision for fraudulent cards. Therefore, they are not suitable for predicting fraudulent credit cards. I recommend using The EasyEnsembleClassifier model, which has a balance accuracy score of 0.915 and a recall of 0.9 for non-fraudulent credit cards.