Bikeshare Rental Batch Analytical Engine

The purpose of the project is to create a bikeshare rental prediction engine, to help Washington's government understand bikeshare rental patterns and which regions to focus on with spending to improve bikeshare rental infrastructure

Motivation

The analytical team wants to assist the government by building a rental prediction engine (a batch service application), which will help to score past and future data of bike rentals in Washington. The engine will help the government determine past and new trends, as the ways in which people use the service evolves overtime. The insights will enable the government to provide better infrastrure for the bikepaths in Washington DC. The engine delivers on the promise by learning from rides that happended in the near past, and then make predictions on estimated time for each new ride. The predicted outcome is provided in minutes for the ride. Highly disparate predictions can be used to understand infrastructiure problems in the state and help hone on specific regions requiring attention.

Data Sources

Data Company - The data comes from capital bikeshare company Datasets - The data is collected and made available for consumption on a monthly basis

Build With

The section covers tools used to run the project

Mlflow for experiment tracking, versioning the model
Mfllow for model registry and management
Prefect for workflow orchistration of model training pipeline, monitoring pipeline and ride scoring pipeline
Prefect for scheduling pipelines
Evideltly for monitoring of model, data and feature drifts
Terraform for provisioning infrastructure for storing mlflow artifacts, prefect block storage and ec2 infrastructure for model training, monitoring and scoring bike rides
Github actions for continuous integration and deployment workflows
AWS web services as a cloud provider

Project Folders

.github - Contains logic and files for github actions
deploy - Contains logic and files for infrastructure as code
model-deployment - Contains logic and files for scoring bike share rental in a batch mode, and unit tests and integration tests associated with scoring logic
training-and-monitoring - Contains logic for model training and retraining on a schedule. Also contains logic for monitoring model performance and drifts on a schedule, with added logic to conditionally retrain the model when drift is detected

How to run training - Step by step guide

Prerequisites

Create an AWS IAM role with permissions to create buckets, ec2
Add programmatic access to the role and store access key and secret access key securely
Initialise the terraform with the IAM role
Have anaconda and python 3.9+
Have git for cloning the repo

For provisioning Terraform - please check the link below Terraform provisioning

To run it locally or in EC2 instance steps are the same - But infrastructure must be provisioned

Run make apply_stage_local from the root directory to provision buckets, IAM and EC2
Run make create_monitoring_stage from the root directory to create prefect flows, deployment and block storage for training and monitoring
(Optional) if want to run things on EC2 instance - Please follow this link Environment setup and ssh
Run make setup for the preliminary setup for local or ec2 environment
Run make start_mlflow_stage in the root directory to start mlflow server
In the separate terminal, ensure your aws profile is activated, but not required if you are on ec2
Run cd training-and-monitoring && pipenv install --dev and then run pipenv shell
Download the data and run python get_data.py
Run cd prefect_training_monitoring and then run python model_training.py
To deploy training and run on a schedule run bash run_training.sh

How to run scoring

In the root directory run make create_scoring_stage
In the root directory start mlflow by running make start_mlflow_stage
Run cd model-deployment && pipenv install --dev and then run pipenv shell
Run cd prefect_deployment and then run python score.py 2022 05
To deploy and keep running on a schedule on EC2 or locally run bash run.sh

How to run monitoring

In the root directory start mlflow by running make start_mlflow_stage
Run cd training-and-monitoring && pipenv install --dev and then run pipenv shell
Run cd prefect_training_monitoring and then run python monitoring.py
To deploy and run on a schedule run bash run_monitoring.sh

How to run tests

Run cd model-deployment
To run unit tests run make quality_checks
To run integration tests run make integration_tests

How to deploy and destroy infrastructure as code

Terraform Plan

For stage: In the root directory of the repo run make plan_stage
For prod: In the root directory of the repo run make plan_prod

To provision and apply

For stage: In the root directory of the repo run make apply_stage_local
For prod: In the root directory of the repo run make apply_prod_local

To destroy

For stage: In the root directory of the repo run make destroy_stage_local
For prod: In the root directory of the repo run make destroy_prod_local

Contact for questions or support

Nakul Bajaj @Nakulbajaj101

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.github/workflows		.github/workflows
deploy		deploy
model-deployment		model-deployment
training-and-monitoring		training-and-monitoring
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
Makefile		Makefile
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Bikeshare Rental Batch Analytical Engine

Motivation

Data Sources

Build With

Project Folders

How to run training - Step by step guide

How to run scoring

How to run monitoring

How to run tests

How to deploy and destroy infrastructure as code

Terraform Plan

To provision and apply

To destroy

Contact for questions or support

About

Releases

Packages

Languages

Nakulbajaj101/mlops-zoomcamp-final-project

Folders and files

Latest commit

History

Repository files navigation

Bikeshare Rental Batch Analytical Engine

Motivation

Data Sources

Build With

Project Folders

How to run training - Step by step guide

How to run scoring

How to run monitoring

How to run tests

How to deploy and destroy infrastructure as code

Terraform Plan

To provision and apply

To destroy

Contact for questions or support

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages