Skip to content

MHKamel/Machine-Learning-Based-Predictive-Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 

Repository files navigation

📖 Overview

This project applies Machine Learning techniques to analyze and predict employee turnover based on multiple workforce-related features. The dataset contains anonymized employee information, including education, job history, demographics, and salary tier. The goal is to train models to predict whether an employee will leave or stay using various classification algorithms.

📊 Dataset Information

- **Source**: [Kaggle - Employee Dataset](https://www.kaggle.com/datasets/tawfikelmetwally/employee-dataset)
- **Target Variable**: `LeaveOrNot` (1 = Employee leaves, 0 = Employee stays)
- **Key Features**:
  - `Education`: Degree, institution, and field of study
  - `JoiningYear`: The year an employee joined the company
  - `City`: Location of the employee
  - `PaymentTier`: Salary classification level
  - `Age`: Employee's age
  - `Gender`: Gender identity
  - `EverBenched`: Whether an employee had gaps in assigned work
  - `Experience`: Years worked in the current domain

🚀 Installation

Ensure you have Python 3.x installed and then install dependencies using:

pip install -r requirements.txt

Alternatively, manually install required libraries:

pip install pandas scikit-learn seaborn matplotlib

🔧 How to Run

git clone https://github.com/your-username/machine-learning-assignment.git
cd machine-learning-assignment
python machine_learning_assignment.py

⚡ Models Used

The script evaluates three different machine learning models:

  • Decision Tree Classifier
  • Support Vector Machine (SVM) Classifier (Best performing model)
  • 🔹 K-Nearest Neighbors (KNN) Classifier

📈 Results & Insights

  • Best Model: SVM achieved the highest accuracy.
  • Evaluation Metrics: Accuracy, Precision, Recall, F1-score, Confusion Matrix
  • Potential Improvement: Additional features might improve accuracy further.

📌 Notes

  • The dataset assumes the year is 2023 for tenure calculations.
  • Hyperparameter tuning and cross-validation were used to optimize models.

🛠 Future Enhancements

  • 📌 Try additional ML models (Random Forest, XGBoost, etc.)
  • 📌 Perform feature selection for better accuracy
  • 📌 Test on a larger dataset for better generalization

🏆 Author

Mohamed Hassan Kamel Amin Mohamed
🎓 Student ID: GH1025497
📅 M606 Machine Learning, April 2024 Intake


📌 Colab Notebook: View Here

📜 License

This project is licensed under the MIT License - see the LICENSE file for details.

🤝 Contributing

Pull requests are welcome! For major changes, please open an issue first to discuss what you would like to change.

⭐ Show Your Support

If you liked this project, give it a ⭐ on GitHub!


About

Developed a machine learning model for predictive analytics.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published