A collection of notebooks for walking through the typical ML lifecycle from data cleaning through to model hosting using Amazon SageMaker.
A typical ML lifecycle will look something like...
- Identify a business problem or question which ML can answer
- Identify the data sources available to describe the problem space
- Acquire and cleanse the data or a sample of the data
- Engineer a feature set from the data or data sample so that everything has meaning and relevance
- Apply this cleansing and feature engineering logic to the full data set
- Spot check multiple ML algorithms against a sample of the feature set to assess which algorithm is likely to give the best result
- Select one or more algorithms and perform hyperparameter optimization to determine the best configuration parameters, use a sample of the feature set
- Train a model using the best performing algorithm and hyperparameters on the full training feature set
- Test the model on a control or test feature set to produce a baseline for performance
- Deploy the model for consumption by the business (Lambda, mobile device, container, etc)
- Consider how future observations will be engineered in preparation for inference
- Monitor the model for context drift
For this collection of labs we will start by defining a business problem and then work through the process through to model deployment.
- [Feature engineering](./01 Feature engineering.ipynb) This notebook walks through acquiring the data, cleaning it and then engineering a base feature set which can then be prepared for ML training.
- [ML algorithm spot check](./02 Algorithm spot check.ipynb) This notebook walks through transforming the cleansed data to assess the performance of many ML algorithms.
- [Hyperparameter optimization](./03 Hyperparameter tuning.ipynb) This notebook walks through performing HPO on an algorithm and a subset of the feature set before performing a full scale training job.
- [Training your model](./04 Training.ipynb) This notebook walks through performing a full scale training job of your model.
- [Hosting and usage](./05 Host and infer.ipynb) This notebook walks through how to host a trained model and use it to make predictions.
- What’s your ML test score? A rubric for ML production systems
- Automation of data profiling
- Automated data profiling example
- Automated data profiling for Spark
- Machine Learning: The High Interest Credit Card of Technical Debt