This is a machine learning project to predict customer churn in a telecom company. The goal is to build a model that can accurately identify customers who are likely to leave the company so that proactive steps can be taken to retain them. The project also aims at finding churn risk score and factors influencing churning of customers.
The project involves data cleaning and preprocessing, exploratory data analysis, feature selection, model training and evaluation.
The dataset used for this project is the Telecom Customer Churn dataset, which can was provided by DataMites Institute. It contains information about customers, such as their account length, international plan, voicemail messages, day and night calls, and charges, as well as dependent variable i.e, whether they churned or not.
The project requires the following libraries:
- pandas
- numpy
- matplotlib
- seaborn
- scikit-learn
- xgboost
The model achieved an f1 score of 97%, indicating its ability to accurately predict customer churn. The top five features that influence customer churn are account length, customer service calls, international plan, voicemail messages, and day charge.