This project applies Machine Learning techniques to analyze and predict employee turnover based on multiple workforce-related features. The dataset contains anonymized employee information, including education, job history, demographics, and salary tier. The goal is to train models to predict whether an employee will leave or stay using various classification algorithms.
- **Source**: [Kaggle - Employee Dataset](https://www.kaggle.com/datasets/tawfikelmetwally/employee-dataset)
- **Target Variable**: `LeaveOrNot` (1 = Employee leaves, 0 = Employee stays)
- **Key Features**:
- `Education`: Degree, institution, and field of study
- `JoiningYear`: The year an employee joined the company
- `City`: Location of the employee
- `PaymentTier`: Salary classification level
- `Age`: Employee's age
- `Gender`: Gender identity
- `EverBenched`: Whether an employee had gaps in assigned work
- `Experience`: Years worked in the current domainEnsure you have Python 3.x installed and then install dependencies using:
pip install -r requirements.txtAlternatively, manually install required libraries:
pip install pandas scikit-learn seaborn matplotlibgit clone https://github.com/your-username/machine-learning-assignment.git
cd machine-learning-assignment
python machine_learning_assignment.pyThe script evaluates three different machine learning models:
- ✅ Decision Tree Classifier
- ⭐ Support Vector Machine (SVM) Classifier (Best performing model)
- 🔹 K-Nearest Neighbors (KNN) Classifier
- Best Model: SVM achieved the highest accuracy.
- Evaluation Metrics: Accuracy, Precision, Recall, F1-score, Confusion Matrix
- Potential Improvement: Additional features might improve accuracy further.
- The dataset assumes the year is 2023 for tenure calculations.
- Hyperparameter tuning and cross-validation were used to optimize models.
- 📌 Try additional ML models (Random Forest, XGBoost, etc.)
- 📌 Perform feature selection for better accuracy
- 📌 Test on a larger dataset for better generalization
Mohamed Hassan Kamel Amin Mohamed
🎓 Student ID: GH1025497
📅 M606 Machine Learning, April 2024 Intake
📌 Colab Notebook: View Here
This project is licensed under the MIT License - see the LICENSE file for details.
Pull requests are welcome! For major changes, please open an issue first to discuss what you would like to change.
If you liked this project, give it a ⭐ on GitHub!