Skip to content

rajeshkumargp/building-machine-learning-pipelines

 
 

Repository files navigation

Building Machine Learning Pipelines

Code repository for the O'Reilly publication "Building Machine Learning Pipelines" by Hannes Hapke & Catherine Nelson

Set up the demo project

Download the initial dataset. From the root of this repository, execute

python3 utils/download_dataset.py

After this script runs, you should have a data folder containing the file consumer_complaints_with_narrative.csv. The original source of this dataset is https://www.kaggle.com/cfpb/us-consumer-finance-complaints?select=consumer_complaints.csv

Pre-pipeline experiment

Interactive pipeline

The interactive-pipeline folder contains a full interactive TFX pipeline for the consumer complaint data.

Full pipelines with Apache Beam, Apache Airflow, Kubeflow Pipelines, GCP

Chapters

Data privacy

Chapter 14. Code for training a differentially private version of the demo project. Note that the TF-Privacy module only supports TF 1.x as of June 2020.

About

Code repository for the O'Reilly publication "Building Machine Learning Pipelines" by Hannes Hapke & Catherine Nelson

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 92.1%
  • Python 7.5%
  • Other 0.4%