EmployeeAttrition - Leveraging AI to Retain Employees and Reduce Attrition

An application of machine learning that helps reduce organizational resource strain.

By Jonathan Lane

Overview

The goal of this project is to use an open data set provided by IBM to determine what employee features contribute to the likelihood of their attrition (leaving/quitting the company). This project is an introductory machine learning project of mine, and leverages the extensive classification model options provided by the scikit-learn library. Ultimately, the model achieved an impressive F1-score of approximately 0.85, which is relatively high considering the size of the dataset (roughly 122KB). The model also exceeds the following goals I set prior to training:

>80% accuracy
>0.70 AUC_ROC value

Features

A predictive method of determining the likelihood of any employee's attrition.
Two descriptive figures describing correlations found in the available data set.
The ability to adjust features for each employee, which can help the user implement smart retention strategies (the user can identify which feature has the greatest impact on probability).
Reproducible data set preprocessing (see notebooks/02_data_preprocessing.ipynb).
Data exploration and plotting using Pandas and Matplotlib.
An interactive UI for adjusting a single input to the predictive model.
Utilization of a StackingClassifier and other hyperparameter-optimized classification models from the scikit-learn library.
An impressive weighted F1-score of 0.85 considering the size of the data set.
Self-hosted operation of the model, allowing files to be shared securely by an organization.
An organized Jupyter Notebook directory structure.
A clean Jupyter Notebook for the end user to use the trained ML model.

Installation

Prerequisites

A reasonably performant computer running Microsoft Windows 11.
Git installed and added to your Windows environment PATH variable.
The latest version of Python 3.12.
The latest version of Miniconda that supports Python 3.12 (I used 24.4.0).

User Guide

1. Clone the Project Repository to Your Project Directory

Using your Miniconda PowerShell Terminal, clone the project from GitHub with this command:

git clone https://github.com/jmsuan/EmployeeAttrition.git

2. Go to the Project Folder

Change to the project directory using:

cd EmployeeAttrition

3. Set Up the `prod` Environment

This will create a conda environment installed with the same libraries I tested the application with.

conda env create --file .\envs\prod.yml --prefix .\envs\prod

Using the App

1. Activate the `prod` Environment

Using your Miniconda PowerShell Terminal navigated to the project's root folder, activate the environment we created during the installation process:

conda activate .\envs\prod

2. Start Jupyter Notebook

Start self-hosting a Jupyter Notebook interface by entering in the terminal:

jupyter notebook

If done correctly, this should open your Jupyter Notebook instance in your computer's default internet browser.

3. Open the `application` Notebook

Double-click the notebooks directory to expand it.
Double-click on application.ipynb.

4. Run the Python Cell(s) in the Notebook

In the Notebook header, you should see a Run menu. Click it.
Click Run All Cells.
Scroll up to see the application interface widgets.

5. Use the Widgets to Experiment

The sliders are set to median values calculated from our available data, and are limited to the range of the data's input.
Experiment with different inputs using the interactive widgets to visually see how they affect an employee's likelihood of attrition.

6. When Finished, Close the Application

You can close the application by opening the Anaconda Terminal window that's running the Jupyter Notebook and using the Ctrl+C hotkey. You may then close the environment (and Conda) by entering exit in the terminal, or by closing the window.

Implementation Details

The classification model that I trained for this project was a StackingClassifier, which classifies using inputs from various other scikit-learn estimators. In this case, the estimators I used included Random Forest, Gaussian Naive-Bayes, Histogram-based Gradient Boosting, and an AdaBoost classifier. See notebooks/03_model_training.ipynb for further details.
The program is designed with extensibility in mind, using Jupyter Notebooks as a method for documenting and prototyping the ready-to-polish application.

Environment

Software: Developed in Python 3.12.2 using Jupyter Notebooks and the following libraries:
- numpy
- pandas
- scikit-learn
- ydata-profiling
Hardware: Tested on an ASUS ROG Zephyrus Duo 16 (2023) laptop with the follow specifications:
- Processor: AMD Ryzen 9 7945HX with 2501 MHz, 16 Cores, 32 LPs
- Graphics Card: NVIDIA GeForce RTX 4090 Laptop GPU
- Memory (RAM): 32.0 GB
- Storage: 1.9 TB SSD

Acknowledgments

IBM for providing this project's open data set.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
data		data
envs		envs
models		models
notebooks		notebooks
reports		reports
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EmployeeAttrition - Leveraging AI to Retain Employees and Reduce Attrition

Overview

Features

Installation

Prerequisites

User Guide

1. Clone the Project Repository to Your Project Directory

2. Go to the Project Folder

3. Set Up the `prod` Environment

Using the App

1. Activate the `prod` Environment

2. Start Jupyter Notebook

3. Open the `application` Notebook

4. Run the Python Cell(s) in the Notebook

5. Use the Widgets to Experiment

6. When Finished, Close the Application

Implementation Details

Environment

Acknowledgments

About

Uh oh!

Releases

Packages

Languages

License

jmsuan/EmployeeAttrition

Folders and files

Latest commit

History

Repository files navigation

EmployeeAttrition - Leveraging AI to Retain Employees and Reduce Attrition

Overview

Features

Installation

Prerequisites

User Guide

1. Clone the Project Repository to Your Project Directory

2. Go to the Project Folder

3. Set Up the prod Environment

Using the App

1. Activate the prod Environment

2. Start Jupyter Notebook

3. Open the application Notebook

4. Run the Python Cell(s) in the Notebook

5. Use the Widgets to Experiment

6. When Finished, Close the Application

Implementation Details

Environment

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

3. Set Up the `prod` Environment

1. Activate the `prod` Environment

3. Open the `application` Notebook

Packages