Ever wondered how to deploy a Spark machine learning model in production on Azure? Well, you've come to the right place! This tutorial walks you through building predictive APIs (both realtime and batch) powered by Spark machine learning models, and deploying them to HDinsight and Azure Container Service clusters for scale.
We'll start off by provisioning a Data Science VM to develop and test our APIs.
The getting started environment uses a Data Science VM (DSVM). For information on provisioning a DSVM, see Provision the Linux Data Science Virtual Machine.
Once you have signed into the DSVM, run the following commands and follow the prompts:
$ wget -q http://amlsamples.blob.core.windows.net/scripts/amlupdate.sh -O - | sudo bash -
$ sudo /opt/microsoft/azureml/initial_setup.sh
NOTE: You must log out and log back in to your SSH session for the changes to take effect.
Next, enter the AML environment setup command. NOTE: The following items when completing the environment setup:
- Enter a name for the environment. Environment names must between 3 and 17 characters in length and can only consist of numbers and lowercase letters.
- You will be prompted to sign in to Azure. To sign in, use a web browser to open the page https://aka.ms/devicelogin and enter the provided code to authenticate.
- During the authentication process you will be prompted for an account to authenticate with. Use the account under which you created the DSVM.
- When the sign in is complete your subscription information will be presented and you will be prompted whether you wish to continue with the select account.
Environment setup command:
$ aml env setup
Once the setup command has finished, it outputs environment export commands for the AML CLI environment. It also saves these export commands to a file in your home directory. Source the file to set up your environment variables:
$ source ~/.amlenvrc
To always set these variables when you log in, copy the export commands into your .bashrc file:
$ cat < ~/.amlenvrc >> ~/.bashrc
Jupyter is running on the DSVM at https://<machine-ip-address>:8000. Open Jupyter in a browser and sign in. The user name and password are those that you configured for the DSVM. Note that you will receive a certificate warning that you can safely click through.
There are notebooks for both the real-time and batch web service scenarios. The notebooks are located in the azureml/realtime and azureml/batch folders.
To run the real-time scenario, from the azureml folder, change to the realtime folder and open the realtimewebservices.ipynb notebook. Follow the instructions to train, save, and deploy a model as a real-time web service. The notebook contains instructions for deploying to the DSVM and for deployment to a production ACS environment.
To run the batch scenario on the DSVM, from the azureml folder, change to the batch folder and open the batchwebservices.ipynb notebook. Follow the provided instructions to train, save, and deploy a model as a local web service to the DSVM or to a production HDInsight environment.
To update the Azure ML bits on the DSVM, run the following command.
$ wget -q http://amlsamples.blob.core.windows.net/scripts/amlupdate.sh -O - | sudo bash -