What is CML? Continuous Machine Learning (CML) is an open-source library for implementing continuous integration & delivery (CI/CD) in machine learning projects. Use it to automate parts of your development workflow, including model training and evaluation, comparing ML experiments across your project history, and monitoring changing datasets.
On every pull request, CML helps you automatically train and evaluate models, then generates a visual report with results and metrics. Above, an example report for a neural style transfer model.
We built CML with these principles in mind:
- GitFlow for data science. Use GitLab or GitHub to manage ML experiments, track who trained ML models or modified data and when. Codify data and models with DVC instead of pushing to a Git repo.
- Auto reports for ML experiments. Auto-generate reports with metrics and plots in each Git Pull Request. Rigorous engineering practices help your team make informed, data-driven decisions.
- No additional services. Build your own ML platform using just GitHub or GitLab and your favorite cloud services: AWS, Azure, GCP. No databases, services or complex setup needed.
🌟🌟🌟 Check out our YouTube video series for hands-on MLOps tutorials using CML! 🌟🌟🌟
- Usage
- Getting started
- Using CML with DVC
- Using self-hosted runners
- Install CML as a package
- Examples
You'll need a GitHub or GitLab account to begin. Users may wish to familiarize themselves with Github Actions or GitLab CI/CD. Here, will discuss the GitHub use case.
🪣 Bitbucket Cloud users We support you, too- see our docs here.🪣 Bitbucket Server support estimated to arrive by January 2021.
The key file in any CML project is .github/workflows/cml.yaml
.
name: your-workflow-name
on: [push]
jobs:
run:
runs-on: [ubuntu-latest]
container: docker://dvcorg/cml-py3:latest
steps:
- uses: actions/checkout@v2
- name: 'Train my model'
env:
repo_token: ${{ secrets.GITHUB_TOKEN }}
run: |
# Your ML workflow goes here
pip install -r requirements.txt
python train.py
# Write your CML report
cat results.txt >> report.md
cml-send-comment report.md
CML provides a number of helper functions to help package outputs from ML
workflows, such as numeric data and data vizualizations about model performance,
into a CML report. The library comes pre-installed on our
custom Docker images.
In the above example, note the field container: docker://dvcorg/cml-py3:latest
specifies the CML Docker image with Python 3 will be pulled by the GitHub
Actions runner.
Below is a list of CML functions for writing markdown reports and delivering those reports to your CI system (GitHub Actions or GitLab CI).
Function | Description | Inputs |
---|---|---|
cml-send-comment |
Return CML report as a comment in your GitHub/GitLab workflow. | <path to report> --head-sha <sha> |
cml-send-github-check |
Return CML report as a check in GitHub | <path to report> --head-sha <sha> |
cml-publish |
Publish an image for writing to CML report. | <path to image> --title <image title> --md |
cml-tensorboard-dev |
Return a link to a Tensorboard.dev page | --logdir <path to logs> --title <experiment title> --md |
CML reports are written in GitHub Flavored Markdown. That means they can contain images, tables, formatted text, HTML blocks, code snippets and more - really, what you put in a CML report is up to you. Some examples:
📝 Text. Write to your report using whatever method you prefer. For example, copy the contents of a text file containing the results of ML model training:
cat results.txt >> report.md
🖼️ Images Display images using the markdown or HTML. Note that if an image
is an output of your ML workflow (i.e., it is produced by your workflow), you
will need to use the cml-publish
function to include it a CML report. For
example, if graph.png
is the output of my workflow python train.py
, run:
cml-publish graph.png --md >> report.md
- Fork our
example project repository.
⚠️ Note that if you are using GitLab, you will need to create a Personal Access Token for this example to work.
The following steps can all be done in the GitHub browser interface. However, to follow along the commands, we recommend cloning your fork to your local workstation:
git clone https://github.com/<your-username>/example_cml
- To create a CML workflow, copy the following into a new file,
.github/workflows/cml.yaml
:
name: model-training
on: [push]
jobs:
run:
runs-on: [ubuntu-latest]
container: docker://dvcorg/cml-py3:latest
steps:
- uses: actions/checkout@v2
- name: 'Train my model'
env:
repo_token: ${{ secrets.GITHUB_TOKEN }}
run: |
pip install -r requirements.txt
python train.py
cat metrics.txt >> report.md
cml-publish confusion_matrix.png --md >> report.md
cml-send-comment report.md
-
In your text editor of choice, edit line 16 of
train.py
todepth = 5
. -
Commit and push the changes:
git checkout -b experiment
git add . && git commit -m "modify forest depth"
git push origin experiment
- In GitHub, open up a Pull Request to compare the
experiment
branch tomaster
.
Shortly, you should see a comment from github-actions
appear in the Pull
Request with your CML report. This is a result of the function
cml-send-comment
in your workflow.
This is the gist of the CML workflow: when you push changes to your GitHub
repository, the workflow in your .github/workflows/cml.yaml
file gets run and
a report generated. CML functions let you display relevant results from the
workflow, like model performance metrics and vizualizations, in GitHub checks
and comments. What kind of workflow you want to run, and want to put in your CML
report, is up to you.
In many ML projects, data isn't stored in a Git repository and needs to be downloaded from external sources. DVC is a common way to bring data to your CML runner. DVC also lets you visualize how metrics differ between commits to make reports like this:
The .github/workflows/cml.yaml
file to create this report is:
name: model-training
on: [push]
jobs:
run:
runs-on: [ubuntu-latest]
container: docker://dvcorg/cml-py3:latest
steps:
- uses: actions/checkout@v2
- name: 'Train my model'
shell: bash
env:
repo_token: ${{ secrets.GITHUB_TOKEN }}
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
run: |
# Install requirements
pip install -r requirements.txt
# Pull data & run-cache from S3 and reproduce pipeline
dvc pull data --run-cache
dvc repro
# Report metrics
echo "## Metrics" >> report.md
git fetch --prune
dvc metrics diff master --show-md >> report.md
# Publish confusion matrix diff
echo -e "## Plots\n### Class confusions" >> report.md
dvc plots diff --target classes.csv --template confusion -x actual -y predicted --show-vega master > vega.json
vl2png vega.json -s 1.5 | cml-publish --md >> report.md
# Publish regularization function diff
echo "### Effects of regularization\n" >> report.md
dvc plots diff --target estimators.csv -x Regularization --show-vega master > vega.json
vl2png vega.json -s 1.5 | cml-publish --md >> report.md
cml-send-comment report.md
If you're using DVC with cloud storage, take note of environmental variables for your storage format.
S3 and S3 compatible storage (Minio, DigitalOcean Spaces, IBM Cloud Object Storage...)
# Github
env:
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
AWS_SESSION_TOKEN: ${{ secrets.AWS_SESSION_TOKEN }}
👉 AWS_SESSION_TOKEN is optional.
Azure
env:
AZURE_STORAGE_CONNECTION_STRING:
${{ secrets.AZURE_STORAGE_CONNECTION_STRING }}
AZURE_STORAGE_CONTAINER_NAME: ${{ secrets.AZURE_STORAGE_CONTAINER_NAME }}
Aliyun
env:
OSS_BUCKET: ${{ secrets.OSS_BUCKET }}
OSS_ACCESS_KEY_ID: ${{ secrets.OSS_ACCESS_KEY_ID }}
OSS_ACCESS_KEY_SECRET: ${{ secrets.OSS_ACCESS_KEY_SECRET }}
OSS_ENDPOINT: ${{ secrets.OSS_ENDPOINT }}
Google Storage
⚠️ Normally, GOOGLE_APPLICATION_CREDENTIALS points to the path of the json file that contains the credentials. However in the action this variable CONTAINS the content of the file. Copy that json and add it as a secret.
env:
GOOGLE_APPLICATION_CREDENTIALS: ${{ secrets.GOOGLE_APPLICATION_CREDENTIALS }}
Google Drive
⚠️ After configuring your Google Drive credentials you will find a json file atyour_project_path/.dvc/tmp/gdrive-user-credentials.json
. Copy that json and add it as a secret.
env:
GDRIVE_CREDENTIALS_DATA: ${{ secrets.GDRIVE_CREDENTIALS_DATA }}
GitHub Actions are run on GitHub-hosted runners by default. However, there are many great reasons to use your own runners: to take advantage of GPUs; to orchestrate your team's shared computing resources, or to train in the cloud.
☝️ Tip! Check out the official GitHub documentation to get started setting up your self-hosted runner.
When a workflow requires computational resources (such as GPUs) CML can
automatically allocate cloud instances using cml-runner
. You can spin up
instances on your AWS or Azure account (GCP support is forthcoming!).
For example, the following workflow deploys a t2.micro
instance on AWS EC2 and
trains a model on the instance. After the job runs, the instance automatically
shuts down. You might notice that this workflow is quite similar to the
basic use case highlighted in the beginning of the docs- that's
because it is! What's new is that we've added cml-runner
, plus a few
environmental variables for passing your cloud service credentials to the
workflow.
name: "Train-in-the-cloud"
on: [push]
jobs:
deploy-runner:
runs-on: [ubuntu-latest]
steps:
- uses: iterative/setup-cml@v1
- uses: actions/checkout@v2
- name: "Deploy runner on EC2"
shell: bash
env:
repo_token: ${{ secrets.PERSONAL_ACCESS_TOKEN }}
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
run: |
cml-runner \
--cloud aws \
--cloud-region us-west \
--cloud-type=t2.micro \
--labels=cml-runner
name: model-training
needs: deploy-runner
runs-on: [self-hosted,cml-runner]
container: docker://dvcorg/cml-py3:latest
steps:
- uses: actions/checkout@v2
- name: "Train my model"
env:
repo_token: ${{ secrets.PERSONAL_ACCESS_TOKEN }}
run: |
pip install -r requirements.txt
python train.py
# Publish report with CML
cat metrics.txt > report.md
cml-send-comment report.md
In the above workflow, the step deploy-runner
launches an EC2 t2-micro
instance in the us-west
region. The next step, model-training
, runs on the
newly launched instance.
Note that you can use any container with this workflow! While you must have
CML and its dependencies setup to use CML functions like cml-send-comment
from
your instance, you can create your favorite training environment in the cloud by
pulling the Docker container of your choice.
We like the CML container (docker://dvcorg/cml-py3
) because it comes loaded
with Python, CUDA, git
, node
and other essentials for full-stack data
science. But we don't mind if you do it your way :)
The function cml-runner
accepts the following arguments:
Usage: cml-runner.js
Options:
--version Show version number [boolean]
--labels Comma delimited runner labels [default: "cml"]
--idle-timeout Time in seconds for the runner to be waiting for jobs
before shutting down. 0 waits forever. [default: 300]
--name Name displayed in the repo once registered
[default: "cml-w0qj7mvsz5"]
--single If specified, exit after running a single job.
[boolean] [default: false]
--reuse If specified, don't spawn a new runner if there is a
registed runner with the given labels.
[boolean] [default: false]
--driver If not specify it infers it from the ENV.
[choices: "github", "gitlab"]
--repo Specifies the repo to be used. If not specified
is extracted from the CI ENV.
--token Personal access token to be used. If not
specified in extracted from ENV.
--cloud Cloud to deploy the runner
[choices: "aws", "azure"]
--cloud-region Region where the instance is deployed.
Choices:[us-east, us-west, eu-west, eu-north].
Also accepts native cloud regions.
[default: "us-west"]
--cloud-type Instance type. Choices: [m, l, xl]. Also supports
native types like i.e. t2.micro
--cloud-gpu GPU type. [choices: "nogpu", "k80", "tesla"]
--cloud-hdd-size HDD size in GB.
--cloud-ssh-private Your private RSA SSH key. If not provided will be
generated by the Terraform-provider-Iterative.
[default: ""]
--cloud-ssh-private-visible Your SSH key will be visible in the output with
the rest of the instance properties. [boolean]
--cloud-spot Request a spot instance [boolean]
--cloud-spot-price Spot max price. If not specified it takes current
spot bidding pricing. [default: "-1"]
-h Show help [boolean]
You will need to
create a personal access token
with repository read/write access and workflow privileges. In the example
workflow, this token is stored as PERSONAL_ACCESS_TOKEN
.
Note that you will also need to provide access credentials for your cloud
compute resources as secrets. In the above example, AWS_ACCESS_KEY_ID
and
AWS_SECRET_ACCESS_KEY
are required to deploy EC2 instances.
Please see our docs about environmental variables needed to authenticate with supported cloud services.
You can also use the new cml-runner
function to set up a local self-hosted
runner. On your local machine or on-premise GPU cluster, you'll install CML as a
package and then run:
cml-runner \
--repo $your_project_repository_url \
--token=$personal_access_token \
--labels tf \
--idle-timeout 180
Now your machine will be listening for workflows from your project repository.
In the above examples, CML is pre-installed in a custom Docker image, which is pulled by a CI runner. You can also install CML as a package:
npm i -g @dvcorg/cml
You may need to install additional dependencies to use DVC plots and Vega-Lite CLI commands:
sudo apt-get install -y libcairo2-dev libpango1.0-dev libjpeg-dev libgif-dev \
librsvg2-dev libfontconfig-dev
npm install -g vega-cli vega-lite
CML and Vega-Lite package installation require npm
command from Node package.
Below you can find how to install Node.
In GitHub there is a special action for NPM installation:
uses: actions/setup-node@v1
with:
node-version: '12'
GitLab requires direct installation of the NMP package:
curl -sL https://deb.nodesource.com/setup_12.x | bash
apt-get update
apt-get install -y nodejs
Here are some example projects using CML.