Skip to content

jaymathe/Spark-Operationalization-On-Azure

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

104 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Deploying Spark ML Models on Azure (Preview)

Overview

Ever wondered how to deploy a Spark machine learning model in production on Azure? Well, you've come to the right place! This tutorial walks you through building predictive APIs (both realtime and batch) powered by Spark machine learning models, and deploying them to HDinsight and Azure Container Service clusters for scale.

We'll start off by provisioning a Data Science VM to develop and test our APIs.

Getting Started

The getting started environment uses a Data Science VM (DSVM). For information on provisioning a DSVM, see Provision the Linux Data Science Virtual Machine.

Once you have signed into the DSVM, run the following commands and follow the prompts:

$ wget -q http://amlsamples.blob.core.windows.net/scripts/amlupdate.sh -O - | sudo bash -
$ sudo /opt/microsoft/azureml/initial_setup.sh

NOTE: You must log out and log back in to your SSH session for the changes to take effect.

Next, enter the AML environment setup command. NOTE: The following items when completing the environment setup:

  • Enter a name for the environment. Environment names must between 3 and 17 characters in length and can only consist of numbers and lowercase letters.
  • You will be prompted to sign in to Azure. To sign in, use a web browser to open the page https://aka.ms/devicelogin and enter the provided code to authenticate.
  • During the authentication process you will be prompted for an account to authenticate with. Use the account under which you created the DSVM.
  • When the sign in is complete your subscription information will be presented and you will be prompted whether you wish to continue with the select account.

Environment setup command:

$ aml env setup

Once the setup command has finished, it outputs environment export commands for the AML CLI environment. It also saves these export commands to a file in your home directory. Source the file to set up your environment variables:

$ source ~/.amlenvrc

To always set these variables when you log in, copy the export commands into your .bashrc file:

$ cat < ~/.amlenvrc >> ~/.bashrc

Jupyter notebook

Jupyter is running on the DSVM at https://<machine-ip-address>:8000. Open Jupyter in a browser and sign in. The user name and password are those that you configured for the DSVM. Note that you will receive a certificate warning that you can safely click through.

Run the Notebook

There are notebooks for both the real-time and batch web service scenarios. The notebooks are located in the azureml/realtime and azureml/batch folders.

To run the real-time scenario, from the azureml folder, change to the realtime folder and open the realtimewebservices.ipynb notebook. Follow the instructions to train, save, and deploy a model as a real-time web service. The notebook contains instructions for deploying to the DSVM and for deployment to a production ACS environment.

To run the batch scenario on the DSVM, from the azureml folder, change to the batch folder and open the batchwebservices.ipynb notebook. Follow the provided instructions to train, save, and deploy a model as a local web service to the DSVM or to a production HDInsight environment.

Updating the DSVM environment

To update the Azure ML bits on the DSVM, run the following command.

$ wget -q http://amlsamples.blob.core.windows.net/scripts/amlupdate.sh -O - | sudo bash -

About

Deploying Spark machine learning models to Azure

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published