End-to-end ML project predicting term deposit subscriptions using bank marketing data.
Predicting Term Deposit Subscriptions Using Machine Learning
This project focuses on predicting whether a bank client will subscribe to a term deposit using supervised machine learning models. It involves an end-to-end pipeline including data preprocessing, exploratory data analysis (EDA), feature engineering, handling class imbalance, model training, evaluation, and hyperparameter tuning.
Project Goal: To assist marketing teams by predicting potential term deposit subscribers ahead of time using client data, enabling more targeted campaigns and efficient resource allocation.
Models Used:
- Logistic Regression
- Random Forest
- Gradient Boosting
- XGBoost (Tuned)
Final Result: Best Model: Tuned XGBoost ROC AUC on Test Set: 0.735 Accuracy: 0.88 Precision (Minority Class): 0.49 Recall (Minority Class): 0.31
Dataset: Source: UCI Bank Marketing Dataset(https://archive.ics.uci.edu/dataset/222/bank+marketing)
Size: 45,211 records with 17 features
Target Variable: y (whether the client subscribed to a term deposit)
Pipeline Overview:
-
Data Loading & Cleaning
-
Removed duplicates and placeholder values
-
Handled unknown values with mode imputation
-
EDA & Visualization Distribution plots, count plots, boxplots, correlation heatmap
-
Feature Engineering Interaction terms like housing_loan, contact_frequency, and success_ratio Log transformation of skewed features 6.Preprocessing Label encoding Standard scaling Outlier clipping using IQR Handling Imbalanced Data Applied SMOTE to balance classes
-
Model Training Trained and validated 4 models Evaluated using ROC AUC, Accuracy, Precision, Recall, F1-score
-
Hyperparameter Tuning Performed GridSearchCV for XGBoost and Gradient Boosting
-
Final Evaluation
Key Takeaways:
XGBoost (Tuned) was the best-performing model.
SMOTE improved recall on the minority class (subscribers).
Features like month, poutcome, and contact_frequency were most predictive.
Class imbalance remains a challenge for recall.
Final Model Performance (Test Set)
| Model | Accuracy | Precision | Recall | F1 Score | ROC AUC |
|---|---|---|---|---|---|
| XGBoost (Tuned) | 88% | 0.49 | 0.31 | 0.38 | 0.735 |
| Gradient Boosting | 88% | 0.47 | 0.29 | 0.36 | 0.733 |
Despite class imbalance, tree-based models performed well on real data.
Next Steps: Further improve minority class recall (possibly by threshold tuning or ensemble stacking)
Model explainability with SHAP/LIME
Deploy the trained model via REST API or Streamlit dashboard
By Lohith Basavanahalli Anjinappa