This project focuses on HR Analytics and implements a Logistic Regression classification model to predict employee attrition. The goal is to help organizations understand workforce trends and identify employees who are at risk of leaving.
Employee attrition leads to increased hiring costs and productivity loss.
Using historical HR data, this project aims to:
- Predict whether an employee will leave the organization
- Identify important factors influencing attrition
- Evaluate model performance using classification metrics
- File Name:
HR_file.csv - Target Variable:
Attrition(Binary: Yes / No)
The dataset includes employee attributes related to:
- Personal demographics
- Job role and department
- Compensation and performance
- Work-life balance indicators
- Language: Python
- Libraries Used:
- pandas
- numpy
- matplotlib / seaborn
- scikit-learn
- Environment: Jupyter Notebook
- Handled missing values
- Encoded categorical variables
- Scaled numerical features
- Analyzed attrition distribution
- Identified key trends influencing employee turnover
- Implemented Logistic Regression
- Split data into training and testing sets
- Trained model on training data
- Accuracy
- Precision
- Recall
- F1-score
- Confusion Matrix
- Model achieved approximately 98% accuracy
- High precision and recall for both classes
- Minimal false predictions
- Logistic Regression proved effective and interpretable
The Logistic Regression model successfully predicts employee attrition with high accuracy and balanced classification performance. The model can support HR teams in proactive decision-making and employee retention planning.
- Clone the repository
- Open the notebook in Jupyter Notebook
- Install required libraries
- Run all cells sequentially