This project develops a Machine Learning model to predict user churn in streaming services. The model uses advanced classification techniques to identify behavioral patterns that indicate subscription cancellation risk.
Identify the probability that customers will abandon the streaming service, enabling proactive retention strategies and optimizing marketing resources.
- AUC-ROC: 93.47% (Excellent predictive capability)
- Accuracy: 84.78% (High overall precision)
- Best Model: Random Forest
- Dataset: 125,000 users with 20 features
streaming-churn-prediction-model/
βββ modelochurd.ipynb # Main notebook with complete analysis
βββ train.csv # Training dataset
βββ test.csv # Test dataset
βββ README.md # This file
Note: The trained models (best_rf_label.pkl, best_xgb_label.pkl, best_logistic_regression.pkl) are automatically generated when executing the complete notebook.
weekly_hours- Weekly usage hours (most important)customer_service_inquiries- Customer service inquiriessubscription_type- Subscription typesong_skip_rate- Song skip ratenum_subscription_pauses- Subscription pauses
- Free users: 79.4% churn rate
- Premium/Family users: 34-35% churn rate
- Lower weekly usage = Higher churn risk
- More support inquiries = Higher abandonment probability
| Model | AUC-ROC | Accuracy | Precision | Recall | F1-Score |
|---|---|---|---|---|---|
| Random Forest | 0.9347 | 84.78% | 84.89% | 85.60% | 85.24% |
| Logistic Regression | 0.8935 | 80.44% | 80.97% | 80.91% | 80.94% |
| XGBoost | 0.8732 | 77.24% | 79.08% | 75.68% | 77.35% |
- Python 3.x
- Pandas - Data manipulation
- Scikit-learn - Machine Learning
- XGBoost - Boosting algorithm
- Seaborn/Matplotlib - Visualizations
- NumPy - Numerical computation
age- User ageweekly_hours- Weekly usage hoursaverage_session_length- Average session durationsong_skip_rate- Song skip rateweekly_songs_played- Songs played per weeknum_subscription_pauses- Number of subscription pausescustomer_tenure_years- Customer tenure
subscription_type- Subscription type (Free, Premium, Family, Student)payment_plan- Payment plan (Monthly, Yearly)payment_method- Payment methodlocation- User locationcustomer_service_inquiries- Customer service inquiry frequency
pip install pandas numpy scikit-learn xgboost seaborn matplotlib- Clone the repository
- Open
modelochurd.ipynbin Jupyter Notebook - Run all cells to reproduce the complete analysis
- Los modelos entrenados se guardarΓ‘n automΓ‘ticamente como archivos
.pkl
import pickle
# Load model (se genera al ejecutar el notebook)
with open('best_rf_label.pkl', 'rb') as f:
model = pickle.load(f)
# Make predictions
predictions = model.predict(X_new_data)
probabilities = model.predict_proba(X_new_data)Importante: Los archivos .pkl de los modelos se crean automΓ‘ticamente al ejecutar todas las celdas del notebook modelochurd.ipynb. Si no existen, ejecuta el notebook completo para generarlos.
- 0.5: Random performance
- 0.7-0.8: Good
- 0.8-0.9: Very good
- 0.9+: Excellent
- 1.0: Perfect
- Accuracy: Proportion of correct predictions
- Precision: Efficiency of positive predictions
- Recall: Ability to capture real cases
- F1-Score: Balance between precision and recall
- User segmentation by churn risk
- Personalized campaigns for high-risk users
- Marketing resource optimization
- User experience improvement
- Churn rate by segment
- Retention campaign effectiveness
- Loyalty strategy ROI
- Customer satisfaction
-
Exploratory Data Analysis (EDA)
- Variable distribution
- Correlations
- Missing value analysis
-
Data Preparation
- Categorical variable encoding
- Numerical variable scaling
- Feature engineering
-
Modeling
- Multiple algorithm training
- Hyperparameter optimization
- Cross-validation
-
Evaluation
- Model comparison
- Feature importance analysis
- Result interpretation
- Target variable distribution
- Correlation matrices
- Feature importance
- Confusion matrices
- ROC and Precision-Recall curves
- Churn analysis by categories
This project is under the MIT License. See the LICENSE file for more details.
Your Name
- LinkedIn: https://www.linkedin.com/in/jesus-beleno/
- Email: jesusbelenov@gmail.com
- Dataset provided by [https://www.kaggle.com/competitions/streaming-subscription-churn-model/team]
- Data Science community
- Open source tools used
β If this project was helpful, please give it a star!