Census Income Prediction with Random Forest

Overview

This repository contains code to build a machine learning model using the Random Forest algorithm. The model is trained on the "Census Income" dataset, also known as the "Adult" dataset. The goal is to predict whether an individual's income exceeds $50,000 per year based on various census attributes.

You can find a detailed description of the dataset here.

Project Structure

The repository is structured as follows:

train_model.py: Python script to create and train the Random Forest model.
requirements.txt: List of required Python libraries.
.github/workflows/: GitHub Actions workflows for linting (flake8), unit testing (pytest), and installing requirements.
main.py: FastAPI definition for POST (inference) and GET method

Getting Started

Prerequisites

Before you start, make sure you have the following tools and packages installed:

Python
Pip
Virtual environment (recommended)

To install the required Python libraries, run:

pip install -r requirements.txt

Training the Model

To create and train the Random Forest model, use the following command in the root of the repository:

python train_model.py

To analyze the ML model's prediction errors there is a function called "calculate_slice_metrics" in model.py that computes the data slice metrics (precision, recall, fbeta) of categorical columns' unique classes of the test set.

Testing

Unit tests for the code are implemented using Pytest. You can run the tests within the /tests directory using the following command:

pytest tests.py

Continuous Integration

This repository is set up with GitHub Actions for a basic Continuous Integration/Continuous Deployment (CI/CD) pipeline. The following checks are performed on every push:

Flake8 linter checks for code style.
pytest runs unit tests.
Requirements are installed to ensure dependencies are up to date.

FastAPI - RESTful API

In addition to the machine learning model, a FastAPI-based RESTful API is included. You can find the API in the main.py file in the root directory. It has the following endpoints:

Greeting Endpoint (GET)

A simple GET request that greets the user.

Prediction Endpoint (POST)

This endpoint loads the trained model and provides predictions. To receive a Code 200 response, use a JSON payload with the following format:

{
  "workclass": "Private",
  "education": "HS-grad",
  "marital_status": "Divorced",
  "occupation": "Handlers-cleaners",
  "relationship": "Not-in-family",
  "race": "White",
  "sex": "Male",
  "native_country": "United-States"
}

The prediction is given back to the requestor in the following format:

{
   "prediction": 0.0
}

You can run the API locally via:

uvicorn main:app

You can call a currently running endpoint on Render with the code in the call_endpoint.py file.

Contributing

Feel free to contribute to this project. You can fork the repository, make your changes, and create a pull request. Please ensure your code follows the established style guide.

License

This project is licensed under the MIT License. You are free to use and modify the code for your needs.

Name		Name	Last commit message	Last commit date
Latest commit History 59 Commits
.github/workflows		.github/workflows
.vscode		.vscode
app		app
data		data
model		model
starter		starter
tests		tests
.flake8		.flake8
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
call_endpoint.py		call_endpoint.py
continuous_deployment_1.png		continuous_deployment_1.png
continuous_deployment_2.png		continuous_deployment_2.png
continuous_integration.png		continuous_integration.png
example.png		example.png
live_get.png		live_get.png
live_post.png		live_post.png
main.py		main.py
model_card_template.md		model_card_template.md
requirements.txt		requirements.txt
sanitycheck.py		sanitycheck.py
setup.py		setup.py
slice_output.txt		slice_output.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Census Income Prediction with Random Forest

Overview

Table of Contents

Project Structure

Getting Started

Prerequisites

Training the Model

Testing

Continuous Integration

FastAPI - RESTful API

Greeting Endpoint (GET)

Prediction Endpoint (POST)

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

dhedderich/ml-web-deployment

Folders and files

Latest commit

History

Repository files navigation

Census Income Prediction with Random Forest

Overview

Table of Contents

Project Structure

Getting Started

Prerequisites

Training the Model

Testing

Continuous Integration

FastAPI - RESTful API

Greeting Endpoint (GET)

Prediction Endpoint (POST)

Contributing

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages