Skip to content

Commit 977593a

Browse files
authored
Add Bootstrap code (#186)
-bootstrap script -directory structure
1 parent e187158 commit 977593a

File tree

5 files changed

+209
-34
lines changed

5 files changed

+209
-34
lines changed

README.md

Lines changed: 7 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -11,36 +11,33 @@ description: "Code which demonstrates how to set up and operationalize an MLOps
1111

1212
# MLOps with Azure ML
1313

14-
1514
[![Build Status](https://aidemos.visualstudio.com/MLOps/_apis/build/status/microsoft.MLOpsPython?branchName=master)](https://aidemos.visualstudio.com/MLOps/_build/latest?definitionId=151&branchName=master)
1615

17-
18-
MLOps will help you to understand how to build the Continuous Integration and Continuous Delivery pipeline for a ML/AI project. We will be using the Azure DevOps Project for build and release/deployment pipelines along with Azure ML services for model retraining pipeline, model management and operationalization.
16+
MLOps will help you to understand how to build the Continuous Integration and Continuous Delivery pipeline for a ML/AI project. We will be using the Azure DevOps Project for build and release/deployment pipelines along with Azure ML services for model retraining pipeline, model management and operationalization.
1917

2018
![ML lifecycle](/docs/images/ml-lifecycle.png)
2119

2220
This template contains code and pipeline definition for a machine learning project demonstrating how to automate an end to end ML/AI workflow. The build pipelines include DevOps tasks for data sanity test, unit test, model training on different compute targets, model version management, model evaluation/model selection, model deployment as realtime web service, staged deployment to QA/prod and integration testing.
2321

24-
2522
## Prerequisite
23+
2624
- Active Azure subscription
2725
- At least contributor access to Azure subscription
2826

29-
## Getting Started:
27+
## Getting Started
3028

3129
To deploy this solution in your subscription, follow the manual instructions in the [getting started](docs/getting_started.md) doc
3230

33-
3431
## Architecture Diagram
3532

36-
This reference architecture shows how to implement continuous integration (CI), continuous delivery (CD), and retraining pipeline for an AI application using Azure DevOps and Azure Machine Learning. The solution is built on the scikit-learn diabetes dataset but can be easily adapted for any AI scenario and other popular build systems such as Jenkins and Travis.
33+
This reference architecture shows how to implement continuous integration (CI), continuous delivery (CD), and retraining pipeline for an AI application using Azure DevOps and Azure Machine Learning. The solution is built on the scikit-learn diabetes dataset but can be easily adapted for any AI scenario and other popular build systems such as Jenkins and Travis.
3734

3835
![Architecture](/docs/images/main-flow.png)
3936

40-
4137
## Architecture Flow
4238

4339
### Train Model
40+
4441
1. Data Scientist writes/updates the code and push it to git repo. This triggers the Azure DevOps build pipeline (continuous integration).
4542
2. Once the Azure DevOps build pipeline is triggered, it performs code quality checks, data sanity tests, unit tests, builds an [Azure ML Pipeline](https://docs.microsoft.com/en-us/azure/machine-learning/service/concept-ml-pipelines) and publishes it in an [Azure ML Service Workspace](https://docs.microsoft.com/en-us/azure/machine-learning/service/concept-azure-machine-learning-architecture#workspace).
4643
3. The [Azure ML Pipeline](https://docs.microsoft.com/en-us/azure/machine-learning/service/concept-ml-pipelines) is triggered once the Azure DevOps build pipeline completes. All the tasks in this pipeline runs on Azure ML Compute. Following are the tasks in this pipeline:
@@ -56,13 +53,13 @@ This reference architecture shows how to implement continuous integration (CI),
5653
Once you have registered your ML model, you can use Azure ML + Azure DevOps to deploy it.
5754

5855
The [Azure DevOps multi-stage pipeline](https://docs.microsoft.com/en-us/azure/devops/pipelines/process/stages?view=azure-devops&tabs=yaml) packages the new model along with the scoring file and its python dependencies into a [docker image](https://docs.microsoft.com/en-us/azure/machine-learning/service/concept-azure-machine-learning-architecture#image) and pushes it to [Azure Container Registry](https://docs.microsoft.com/en-us/azure/container-registry/container-registry-intro). This image is used to deploy the model as [web service](https://docs.microsoft.com/en-us/azure/machine-learning/service/concept-azure-machine-learning-architecture#web-service) across QA and Prod environments. The QA environment is running on top of [Azure Container Instances (ACI)](https://azure.microsoft.com/en-us/services/container-instances/) and the Prod environment is built with [Azure Kubernetes Service (AKS)](https://docs.microsoft.com/en-us/azure/aks/intro-kubernetes).
59-
6056

6157
### Repo Details
6258

6359
You can find the details of the code and scripts in the repository [here](/docs/code_description.md)
6460

6561
### References
62+
6663
- [Azure Machine Learning(Azure ML) Service Workspace](https://docs.microsoft.com/en-us/azure/machine-learning/service/overview-what-is-azure-ml)
6764
- [Azure ML CLI](https://docs.microsoft.com/en-us/azure/machine-learning/service/reference-azure-machine-learning-cli)
6865
- [Azure ML Samples](https://docs.microsoft.com/en-us/azure/machine-learning/service/samples-notebooks)
@@ -73,7 +70,7 @@ You can find the details of the code and scripts in the repository [here](/docs/
7370

7471
This project welcomes contributions and suggestions. Most contributions require you to agree to a
7572
Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us
76-
the rights to use your contribution. For details, visit https://cla.microsoft.com.
73+
the rights to use your contribution. For details, visit <https://cla.microsoft.com.>
7774

7875
When you submit a pull request, a CLA-bot will automatically determine whether you need to provide
7976
a CLA and decorate the PR appropriately (e.g., label, comment). Simply follow the instructions

bootstrap/README.md

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
# Bootstrap from MLOpsPython repository
2+
3+
To use this existing project structure and scripts for your new ML project, you can quickly get started from the existing repository, bootstrap and create a template that works for your ML project. Bootstraping will prepare a similar directory structure for your project which includes renaming files and folders, deleting and cleaning up some directories and fixing imports and absolute path based on your project name. This will enable reusing various resources like pre-built pipelines and scripts for your new project.
4+
5+
To bootstrap from the existing MLOpsPython repository clone this repository and run bootstrap.py script as below
6+
7+
>python bootstrap.py --d [dirpath] --n [projectname]
8+
9+
Where [dirpath] is the absolute path to the root of your directory where MLOps repo is cloned and [projectname] is the name of your ML project

bootstrap/bootstrap.py

Lines changed: 135 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,135 @@
1+
import os
2+
import sys
3+
import argparse
4+
# from git import Repo
5+
6+
7+
class Helper:
8+
9+
def __init__(self, project_directory, project_name):
10+
self._project_directory = project_directory
11+
self._project_name = project_name
12+
self._git_repo = "https://github.com/microsoft/MLOpsPython.git"
13+
14+
@property
15+
def project_directory(self):
16+
return self._project_directory
17+
18+
@property
19+
def project_name(self):
20+
return self._project_name
21+
22+
@property
23+
def git_repo(self):
24+
return self._git_repo
25+
26+
# def clonerepo(self):
27+
# # Download MLOpsPython repo from git
28+
# Repo.clone_from(
29+
# self._git_repo, self._project_directory, branch="master", depth=1) # NOQA: E501
30+
# print(self._project_directory)
31+
32+
def renamefiles(self):
33+
# Rename all files starting with diabetes_regression with project name
34+
strtoreplace = "diabetes_regression"
35+
dirs = [".pipelines", r"ml_service\pipelines"]
36+
for dir in dirs:
37+
dirpath = os.path.join(self._project_directory, dir)
38+
for filename in os.listdir(dirpath):
39+
if(filename.find(strtoreplace) != -1):
40+
src = os.path.join(self._project_directory, dir, filename)
41+
dst = os.path.join(self._project_directory,
42+
dir, filename.replace(strtoreplace, self._project_name, 1)) # NOQA: E501
43+
os.rename(src, dst)
44+
45+
def renamedir(self):
46+
# Rename any directory with diabetes_regression with project name
47+
dirs = ["diabetes_regression"]
48+
for dir in dirs:
49+
src = os.path.join(self._project_directory, dir)
50+
dst = os.path.join(self._project_directory, self._project_name)
51+
os.rename(src, dst)
52+
53+
def deletedir(self):
54+
# Delete unwanted directories
55+
dirs = ["docs", r"diabetes_regression\training\R"]
56+
for dir in dirs:
57+
os.system(
58+
'rmdir /S /Q "{}"'.format(os.path.join(self._project_directory, dir))) # NOQA: E501
59+
60+
def replaceprojectname(self):
61+
# Replace instances of diabetes_regression within files
62+
dirs = [r".env.example",
63+
r".pipelines\azdo-base-pipeline.yml",
64+
r".pipelines\azdo-pr-build-train.yml",
65+
r".pipelines\diabetes_regression-ci-build-train.yml",
66+
r".pipelines\diabetes_regression-ci-image.yml",
67+
r".pipelines\diabetes_regression-template-get-model-version.yml", # NOQA: E501
68+
r".pipelines\diabetes_regression-variables.yml",
69+
r"environment_setup\Dockerfile",
70+
r"environment_setup\install_requirements.sh",
71+
r"ml_service\pipelines\diabetes_regression_build_train_pipeline_with_r_on_dbricks.py", # NOQA: E501
72+
r"ml_service\pipelines\diabetes_regression_build_train_pipeline_with_r.py", # NOQA: E501
73+
r"ml_service\pipelines\diabetes_regression_build_train_pipeline.py", # NOQA: E501
74+
r"ml_service\pipelines\diabetes_regression_verify_train_pipeline.py", # NOQA: E501
75+
r"ml_service\util\create_scoring_image.py",
76+
r"diabetes_regression\azureml_environment.json",
77+
r"diabetes_regression\conda_dependencies.yml",
78+
r"diabetes_regression\evaluate\evaluate_model.py",
79+
r"diabetes_regression\training\test_train.py"] # NOQA: E501
80+
81+
for file in dirs:
82+
fin = open(os.path.join(self._project_directory, file),
83+
"rt", encoding="utf8")
84+
data = fin.read()
85+
data = data.replace("diabetes_regression", self.project_name)
86+
fin.close()
87+
fin = open(os.path.join(self._project_directory, file),
88+
"wt", encoding="utf8")
89+
fin.write(data)
90+
fin.close()
91+
92+
def cleandir(self):
93+
# Clean up directories
94+
dirs = ["data", "experimentation"]
95+
for dir in dirs:
96+
for root, dirs, files in os.walk(os.path.join(self._project_directory, dir)): # NOQA: E501
97+
for file in files:
98+
os.remove(os.path.join(root, file))
99+
100+
def validateargs(self):
101+
# Validate arguments
102+
if (os.path.isdir(self._project_directory) is False):
103+
raise Exception(
104+
"Not a valid directory. Please provide absolute directory path") # NOQA: E501
105+
# if (len(os.listdir(self._project_directory)) > 0):
106+
# raise Exception("Directory not empty. PLease empty directory")
107+
if(len(self._project_name) < 3 or len(self._project_name) > 15):
108+
raise Exception("Project name should be 3 to 15 chars long")
109+
110+
111+
def main(args):
112+
parser = argparse.ArgumentParser(description='New Template')
113+
parser.add_argument("--d", type=str,
114+
help="Absolute path to new project direcory")
115+
parser.add_argument(
116+
"--n", type=str, help="Name of the project[3-15 chars] ")
117+
try:
118+
args = parser.parse_args()
119+
project_directory = args.d
120+
project_name = args.n
121+
helper = Helper(project_directory, project_name)
122+
helper.validateargs()
123+
# helper.clonerepo()
124+
helper.cleandir()
125+
helper.replaceprojectname()
126+
helper.deletedir()
127+
helper.renamefiles()
128+
helper.renamedir()
129+
except Exception as e:
130+
print(e)
131+
return 0
132+
133+
134+
if '__main__' == __name__:
135+
sys.exit(main(sys.argv))

docs/code_description.md

Lines changed: 32 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,34 @@
11
## Repo Details
22

3+
### Directory Structure
4+
5+
High level directory structure for this repository:
6+
7+
```bash
8+
├── .pipelines <- Azure DevOps YAML pipelines for CI, PR and model training and deployment.
9+
├── bootstrap <- Python script to initialize this repository with a custom project name.
10+
├── charts <- Helm charts to deploy resources on Azure Kubernetes Service(AKS).
11+
├── data <- Initial set of data to train and evaluate model.
12+
├── diabetes_regression <- The top-level folder for the ML project.
13+
│ ├── evaluate <- Python script to evaluate trained ML model.
14+
│ ├── register <- Python script to register trained ML model with Azure Machine Learning Service.
15+
│ ├── scoring <- Python score.py to deploy trained ML model.
16+
│ ├── training <- Python script to train ML model.
17+
│ ├── R <- R script to train R based ML model.
18+
│ ├── util <- Python script for various utility operations specific to this ML project.
19+
├── docs <- Extensive markdown documentation for entire project.
20+
├── environment_setup <- The top-level folder for everything related to infrastructure.
21+
│ ├── arm-templates <- Azure Resource Manager(ARM) templates to build infrastructure needed for this project.
22+
├── experimentation <- Jupyter notebooks with ML experimentation code.
23+
├── ml_service <- The top-level folder for all Azure Machine Learning resources.
24+
│ ├── pipelines <- Python script that builds Azure Machine Learning pipelines.
25+
│ ├── util <- Python script for various utility operations specific to Azure Machine Learning.
26+
├── .env.example <- Example .env file with environment for local development experience.
27+
├── .gitignore <- A gitignore file specifies intentionally un-tracked files that Git should ignore.
28+
├── LICENSE <- License document for this project.
29+
├── README.md <- The top-level README for developers using this project.
30+
```
31+
332
### Environment Setup
433

534
- `environment_setup/install_requirements.sh` : This script prepares a local conda environment i.e. install the Azure ML SDK and the packages specified in environment definitions.
@@ -8,7 +37,7 @@
837

938
- `environment_setup/Dockerfile` : Dockerfile of a build agent containing Python 3.6 and all required packages.
1039

11-
- `environment_setup/docker-image-pipeline.yml` : An AzDo pipeline for building and pushing [microsoft/mlopspython](https://hub.docker.com/_/microsoft-mlops-python) image.
40+
- `environment_setup/docker-image-pipeline.yml` : An AzDo pipeline for building and pushing [microsoft/mlopspython](https://hub.docker.com/_/microsoft-mlops-python) image.
1241

1342
### Pipelines
1443

@@ -37,10 +66,11 @@
3766
- `diabetes_regression/evaluate/evaluate_model.py` : an evaluating step of an ML training pipeline which registers a new trained model if evaluation shows the new model is more performant than the previous one.
3867
- `diabetes_regression/evaluate/register_model.py` : (LEGACY) registers a new trained model if evaluation shows the new model is more performant than the previous one.
3968
- `diabetes_regression/training/R/r_train.r` : training a model with R basing on a sample dataset (weight_data.csv).
40-
- `diabetes_regression/training/R/train_with_r.py` : a python wrapper (ML Pipeline Step) invoking R training script on ML Compute
69+
- `diabetes_regression/training/R/train_with_r.py` : a python wrapper (ML Pipeline Step) invoking R training script on ML Compute
4170
- `diabetes_regression/training/R/train_with_r_on_databricks.py` : a python wrapper (ML Pipeline Step) invoking R training script on Databricks Compute
4271
- `diabetes_regression/training/R/weight_data.csv` : a sample dataset used by R script (r_train.r) to train a model
4372

4473
### Scoring
74+
4575
- `diabetes_regression/scoring/score.py` : a scoring script which is about to be packed into a Docker Image along with a model while being deployed to QA/Prod environment.
4676
- `diabetes_regression/scoring/inference_config.yml`, deployment_config_aci.yml, deployment_config_aks.yml : configuration files for the [AML Model Deploy](https://marketplace.visualstudio.com/items?itemName=ms-air-aiagility.private-vss-services-azureml&ssr=false#overview) pipeline task for ACI and AKS deployment targets.

0 commit comments

Comments
 (0)