Data Science project to predict the success of a bank telemarketing campaign, the aim is to predict if the client will subscribe to a term deposit.
- performed exploratory data analysis to gather insights on the datasets.
- provided recommendations to improve the success of future campaigns.
- used oversampling and undersampling techniques to increase the size of the minority class for model building.
- optimized Logistic Regression, Random Forest and LightGBM Classifiers using GridsearchCV to reach the best model.
- created external tableau dashboard for visualization.
- converted target column into numeric
- reduced the feature space of the education feature by recategorization
- plotted various univariate and bivariate plots to gather insights on success
- performed Hypothesis testing to test the significance of certain months high performance
We evaluated our models using a combination of recall and the F1 score. Though the logistic regression model has the highest recall, the poor f1 score made us go with the LightGBM model as our model of choice.
Precision | Recall | F1 score | |
---|---|---|---|
Logistic | 0.19 | 0.80 | 0.307 |
Random Forest | 0.28 | 0.73 | 0.407 |
LightGBM | 0.28 | 0.75 | 0.411 |