An end-to-end ML project that predicts customer churn probability using classification models. Trains and compares four models (Logistic Regression, Random Forest, Gradient Boosting, XGBoost), selects the best performer, and serves predictions through an interactive Flask dashboard.
- Trains 4 ML models and picks the best by ROC-AUC
- 5-fold cross-validation for reliable evaluation
- Feature importance visualization
- ROC curve comparison across all models
- Interactive churn prediction for any customer profile
- At-risk customer table with adjustable threshold slider
- Segment analysis by tenure and login frequency
- ML: scikit-learn, XGBoost
- Backend: Python, Flask
- Visualization: Matplotlib, Chart.js
- Data: Synthetic dataset — 5,000 customers with realistic churn patterns
- Clone the repo
python -m venv venvthen activatepip install -r requirements.txtpython data_generator.py— generate datasetpython train_model.py— train models + save plotspython app.py— start dashboard- Open
http://localhost:5006
- Generate synthetic customer data with realistic churn logic
- Feature engineering — encode categoricals, scale numerics
- Train/test split (80/20) with stratification
- Train 4 classifiers with 5-fold CV
- Select best model by ROC-AUC
- Save model with joblib for serving
- Flask API serves predictions in real time