Automated Incremental Updates for Machine Learning Models

In this repository, we demonstrate a comprehensive approach to incrementally incorporate new data into a machine learning model in an automated manner, with a focus on minimizing the associated overhead. The process involves updating the model while it is actively serving predictions in a production environment.

Tools Utilized

This project relies on three powerful tools:

Apache Kafka: A distributed messaging platform that facilitates sequential logging of streaming data into topic-specific feeds. Other applications can then tap into these feeds.
Apache Airflow: A task scheduling platform enabling the creation, orchestration, and monitoring of data workflows.
MLFlow: An open-source tool that assists in tracking machine learning experiments, including logging parameters, results, models, and data for each trial.

Workflow Overview

The workflow can be broken down into the following steps:

Setting up the Environment and Training an Initial Model:
- Establish the necessary environment using Docker containers.
- Train an initial machine learning model.
Simulating Streaming Data:
- Use Kafka to simulate streaming data by pushing data to a Kafka feed.
Periodically Updating the Model:
- Extract data from the Kafka feed at regular intervals.
- Utilize the extracted data to update the machine learning model.
- Evaluate the updated model's performance relative to the current version.
- If the updated model outperforms the current version, deploy it for use.
Logging Results with MLFlow:
- Log the results, model parameters, and sample characteristics of each update run using MLFlow.

Project Structure

The project is structured as follows:

Certainly! Below is the content you can use for your README.md file:

markdown Copy code

Automated Incremental Updates for Machine Learning Models

In this repository, we demonstrate a comprehensive approach to incrementally incorporate new data into a machine learning model in an automated manner, with a focus on minimizing the associated overhead. The process involves updating the model while it is actively serving predictions in a production environment.

Tools Utilized

This project relies on three powerful tools:

Apache Kafka: A distributed messaging platform that facilitates sequential logging of streaming data into topic-specific feeds. Other applications can then tap into these feeds.
Apache Airflow: A task scheduling platform enabling the creation, orchestration, and monitoring of data workflows.
MLFlow: An open-source tool that assists in tracking machine learning experiments, including logging parameters, results, models, and data for each trial.

Workflow Overview

The workflow can be broken down into the following steps:

Setting up the Environment and Training an Initial Model:
- Establish the necessary environment using Docker containers.
- Train an initial machine learning model.
Simulating Streaming Data:
- Use Kafka to simulate streaming data by pushing data to a Kafka feed.
Periodically Updating the Model:
- Extract data from the Kafka feed at regular intervals.
- Utilize the extracted data to update the machine learning model.
- Evaluate the updated model's performance relative to the current version.
- If the updated model outperforms the current version, deploy it for use.
Logging Results with MLFlow:
- Log the results, model parameters, and sample characteristics of each update run using MLFlow.

Project Structure

The project is structured as follows:

project_folder ├── dags │ └── src │ ├── data │ ├── models │ └── preprocessing ├── data │ ├── to_use_for_training │ ├── used_for_training ├── models │ ├── current_model │ └── archive ├── airflow_docker ├── mlflow_docker └── docker_compose.yml

Instructions for Running the Project

Use Docker Compose to set up multi-container applications by running the following commands:

docker compose -f docker-compose-project.yml build docker compose -f docker-compose-project.yml up

Access the Airflow and MLFlow dashboards in your browser through localhost:8080 and localhost:5000, respectively.
Activate each DAG on the Airflow dashboard to train the initial model, push streaming data to Kafka, and update the model at regular intervals.

Acknowledgments

This project leverages the capabilities of Kafka, Airflow, and MLFlow to automate the process of incorporating new data into a machine learning model. .

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Automated Incremental Updates for Machine Learning Models

Tools Utilized

Workflow Overview

Project Structure

Automated Incremental Updates for Machine Learning Models

Tools Utilized

Workflow Overview

Project Structure

Instructions for Running the Project

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
airflow_docker		airflow_docker
dags		dags
data		data
mlflow_docker		mlflow_docker
models		models
.gitattributes		.gitattributes
README.md		README.md
docker-compose-project.yml		docker-compose-project.yml

sparsh35/automated_learning

Folders and files

Latest commit

History

Repository files navigation

Automated Incremental Updates for Machine Learning Models

Tools Utilized

Workflow Overview

Project Structure

Automated Incremental Updates for Machine Learning Models

Tools Utilized

Workflow Overview

Project Structure

Instructions for Running the Project

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages