Welcome to the Azure Machine Learning (AML) template repository!
- An Azure subscription. If you don't have an Azure subscription, create a free account before you begin.
- A terminal and Python >=3.6,<3.9.
Click "Use this template" above and create a repository.
Follow the setup guide below to add your Azure credentials and create required Azure resources. At the end, you will have a repository with:
- simple LightGBM training workflow running every 2 hours and on push/PR
- code format check on push/PR
- resource cleanup script running nightly
First, export your Azure subscription id as an environment variable:
export ID=<your-subscription-id>
Second, create the Azure resource group and required AML resources:
python setup-workspace.py --subscription-id $ID
This will create a resource group named azureml-template
, a workspace named default
, and a cluster named cpu-cluster
. Edit setup-workspace.py
as needed. If you change the names, ensure you change corresponding names in the .github/workflows
files and in the third step below.
Third, create a service principal for the resource group:
az ad sp create-for-rbac --name "azureml-template" \
--role contributor \
--scopes /subscriptions/$ID/resourceGroups/azureml-template \
--sdk-auth
Copy the output json, which looks like this:
{
"clientId": "<GUID>",
"clientSecret": "<GUID>",
"subscriptionId": "<GUID>",
"tenantId": "<GUID>",
(...)
}
In your repository, navigate to "Settings > Secrets > New Secret". Name the secret AZ_CREDS
and paste the json output from above. This is used in the Azure login action in the GitHub Actions. If you use a different name for the secret, ensure you change the corresponding names in the .github/workflows
files.
Adapt this template to automate the entire ML lifecycle on GitHub, using AML for centralized tracking and scaling up/out on Azure compute.
directory | description |
---|---|
.cloud |
cloud templates |
.github |
GitHub specific files like Actions workflow yaml definitions and issue templates |
notebooks |
interactive jupyter notebooks for iterative ML development |
workflows |
self-contained directories of job/workflow to be run |
Modify all files as needed.
Actions:
.github/workflows/smoke.yml
runs on every PR and push tomain
to check code format.github/workflows/cleanup.yml
runs daily and can be used to cleanup AML resources.github/workflows/run-workflows.yml
runs a ml workflow every two hours and push/PR tomain
Other:
requirements.txt
specifies required pip packages for GitHub actionssetup-workspace.py
can be modified for workspace and resource setupcleanup.py
can be modified for nightly workspace cleanup tasksworkflows/basic/job.py
is the AML control codeworkflows/basic/src/train.py
is the ML training script with mlflow trackingworkflows/basic/requirements.txt
specifies required pip packages for the training script