Skip to content

This repository creates a ML model on census data and makes it available via FastAPI for deployment on Render/Heroku.

License

Notifications You must be signed in to change notification settings

dhedderich/ml-web-deployment

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

59 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Census Income Prediction with Random Forest

Overview

This repository contains code to build a machine learning model using the Random Forest algorithm. The model is trained on the "Census Income" dataset, also known as the "Adult" dataset. The goal is to predict whether an individual's income exceeds $50,000 per year based on various census attributes.

You can find a detailed description of the dataset here.

Table of Contents

Project Structure

The repository is structured as follows:

  • train_model.py: Python script to create and train the Random Forest model.
  • requirements.txt: List of required Python libraries.
  • .github/workflows/: GitHub Actions workflows for linting (flake8), unit testing (pytest), and installing requirements.
  • main.py: FastAPI definition for POST (inference) and GET method

Getting Started

Prerequisites

Before you start, make sure you have the following tools and packages installed:

  • Python
  • Pip
  • Virtual environment (recommended)

To install the required Python libraries, run:

pip install -r requirements.txt

Training the Model

To create and train the Random Forest model, use the following command in the root of the repository:

python train_model.py

To analyze the ML model's prediction errors there is a function called "calculate_slice_metrics" in model.py that computes the data slice metrics (precision, recall, fbeta) of categorical columns' unique classes of the test set.

Testing

Unit tests for the code are implemented using Pytest. You can run the tests within the /tests directory using the following command:

pytest tests.py

Continuous Integration

This repository is set up with GitHub Actions for a basic Continuous Integration/Continuous Deployment (CI/CD) pipeline. The following checks are performed on every push:

  • Flake8 linter checks for code style.
  • pytest runs unit tests.
  • Requirements are installed to ensure dependencies are up to date.

FastAPI - RESTful API

In addition to the machine learning model, a FastAPI-based RESTful API is included. You can find the API in the main.py file in the root directory. It has the following endpoints:

Greeting Endpoint (GET)

A simple GET request that greets the user.

Prediction Endpoint (POST)

This endpoint loads the trained model and provides predictions. To receive a Code 200 response, use a JSON payload with the following format:

{
  "workclass": "Private",
  "education": "HS-grad",
  "marital_status": "Divorced",
  "occupation": "Handlers-cleaners",
  "relationship": "Not-in-family",
  "race": "White",
  "sex": "Male",
  "native_country": "United-States"
}

The prediction is given back to the requestor in the following format:

{
   "prediction": 0.0
}

You can run the API locally via:

uvicorn main:app

You can call a currently running endpoint on Render with the code in the call_endpoint.py file.

Contributing

Feel free to contribute to this project. You can fork the repository, make your changes, and create a pull request. Please ensure your code follows the established style guide.

License

This project is licensed under the MIT License. You are free to use and modify the code for your needs.

About

This repository creates a ML model on census data and makes it available via FastAPI for deployment on Render/Heroku.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages