Updates to Lab1

gingi · gingi · commit bfa5aef3de1d · 2018-08-06T22:07:46.000-04:00
diff --git a/Lab1/Tutorial.md b/Lab1/Tutorial.md
@@ -3,8 +3,8 @@
 This lab focuses on helping you become familiar with Azure CycleCloud, a tool
 for orchestrating HPC clusters in Azure.
 
-Please send questions or comments to the Azure CycleCloud PM team -
-<mailto:askcyclecloud @ microsoft.com>
+Please send questions or comments to the Azure CycleCloud PM team:
+askcyclecloud AT microsoft.com.
 
 ## Goals
 * Create a fully functional, configured Azure CycleCloud instance that can be
@@ -22,11 +22,13 @@ Please send questions or comments to the Azure CycleCloud PM team -
 There are several ways to install and setup an Azure CycleCloud server. For this
 lab, you'll be deploying Azure CycleCloud onto a VM via an ARM template. 
 
-As you follow the steps below, *please keep track of the following*: 1. The
-    domain name (FQDN) of your Azure CycleCloud. 2. The `username` used in the
-    ARM template. If you followed the QSG, the same `username` should be used in
-    the Azure CycleCloud web UI. 3. The `password` created in the Azure
-    CycleCloud web UI for your user.
+As you follow the steps below, *please keep track of the following*:
+
+1. The domain name (FQDN) of your Azure CycleCloud.
+2. The `username` used in the ARM template. If you followed the QSG, the same
+   `username` should be used in the Azure CycleCloud web UI.
+3. The `password` created in the Azure
+   CycleCloud web UI for your user.
 
 
 ### 1.1 Log into https://shell.azure.com
@@ -78,6 +80,7 @@ ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAACAQCwIvmC4K/0BUwOBqCsPxw5Ht8qWyDkorrU+gc6cJbo
 ellen@Azure:~$
 ```
 ### 1.5 Create a service principal
+Substitute a custom value for `cyclecloudlabs` as the name of your service principal. The name must be unique across all of Azure.
 ```
 ellen@Azure:~$ az ad sp create-for-rbac --name cyclecloudlabs
 {
@@ -88,7 +91,7 @@ ellen@Azure:~$ az ad sp create-for-rbac --name cyclecloudlabs
     "tenant": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
 }
 ```
-_Save this service principal somewhere. You will need it in the section below as well as in future tutorials in this lab._
+_Save this output somewhere. You will need it in the section below as well as in future tutorials in this lab._
 
 ### 1.6 Deploy Azure CycleCloud
 [![Deploy to
@@ -99,151 +102,146 @@ Azure](https://azuredeploy.net/deploybutton.svg)](https://portal.azure.com/#crea
 
 Enter the required information:
 
+* *Resource Group*: Enter any custom name
 * *Tenant ID*: The `tenant` listed above in the service principal
 * *Application ID*: `appId` of the service principal
 * *Application Secret*: `password` of the service principal
 * *SSH Public Key*: Copy and paste here the output of step 1.4
 * *Username*: The output of step 1.2 (e.g. *johnsmith* instead of
   *johnsmith@domain.com*)
+* All other fields can be left unedited
 
+After you accept the Terms & Conditions, press the "Purchase" button to begin deployment.
 The deployment process runs an installation script as a custom script extension,
 which installs and sets up CycleCloud. This process takes between 5 and 8 mins.
 
 ### 1.7 Retrieve the Domain Name (FQDN) of the Azure CycleCloud VM
-* When the deployment is completed you can retrieve the fully qualified domain
-  name of the Azure CycleCloud VM from the Azure portal or using the CLI in
-  Cloud Shell: ![Deployment Output](images/deployment-output.png)
+When the deployment is completed you can retrieve the fully qualified domain
+name of the Azure CycleCloud VM from the Outputs tab in the Azure portal: ![Deployment Output](images/deployment-output.png)
 
+Or using Cloud Shell (replace `MyResourceGroup` with the value you used):
 ```
-# replace ResourceGroupName with the one that you used
-ellen@Azure:~$ az group deployment list -g ${ResourceGroupName} --query "[0].properties.outputs.fqdn.value"
+ellen@Azure:~$ az group deployment list -g MyResourceGroup --query "[0].properties.outputs.fqdn.value"
 "cyclecloud43vgp4.eastus.cloudapp.azure.com"
+ellen@Azure:~$
 ```
 
 ### 1.8 Logging into the Azure CycleCloud server for the first time
-* In your web browser go to https://fqdn (where fqdn is the addressed retrieved
-  above).
-* You will be asked to enter a site name in the first screen, pick any name you
-  like
-* Accepting the EULA on the second screen brings you to a page that asks you to
-  create a admin user.
-    - Use the same `username` used above in step 1.5. Remember that this is also
-      the username of your Cloud Shell session.
-    - Choose a password that meets the minimum requirements. ![First
-      Login](images/cc-first-login.png) ![Create
-      Account](images/cc-create-account.png)
-
-## 2.  Starting an auto-scaling HPC cluster
-In this section, you will start a cluster using PBS-Pro as a scheduler, with
-LAMMPS as a solver
+* In your web browser go to `https://{FQDN}` (where FQDN is the address retrieved
+  in the previous step). The installation uses a self-signed SSL certificate,
+  which may show up with a warning in your browser.
+* The first screen will prompt you to enter a site name; enter any name you
+  like and then click "Next".![First Login](images/cc-first-login.png)
+* Accept the End-User License Agreement and click "Next".
+* Create an admin user:
+    - For the User ID, use the same `username` used above in step 1.5. Remember 
+      that this is also the username of your Cloud Shell session.
+    - Enter a name for the user.
+    - Enter and confirm a new password that meets the minimum requirements.
+
+  ![Create Account](images/cc-create-account.png)
+
+## 2. Starting an auto-scaling HPC cluster
+In this section, you will start a cluster using PBS Pro as a scheduler and
+LAMMPS as a solver.
 
 ### 2.1 Start a new LAMMPS cluster in Azure CycleCloud
 
-If you do not have a cluster that is already running, the default start page of
-Azure CycleCloud will display a wall of applications and cluster types that are
-distributed with each install
-* Find the LAMMPS cluster icon and select it. ![CC Cluster
-  Wall](images/cc-cluster-wall.png)
-* Provide a name for the new cluster and move on to the *Required Settings*
-  section. ![CC New Cluster LAMMPS](images/cc-newcluster-laamps.png)
-* Select a VM type that you should like to use as a `Execute VM Type`, we
-  recommend the H16r if you have quota for these.
-* In the networking subnet dropdown, select the subnet which has "-compute" as a
-  suffix. This subnet was created as part of the ARM deployment. ![CC Cluster
-  Required Settings](images/cc-cluster-required-settings.png)
-* The *Advanced Settings* section allows you to configure the cluster to use a
-  different OS, set up different projects, as well as to attach a public IP
-  address to the cluster nodes. There is no need to change any settings here for
-  the purposes of this lab. ![CC Cluster Advanced
-  Settings](images/cc-cluster-adv-settings.png)
-* Click the *Save* button on the bottom right-hand corner of the page to create
-  and save this cluster.
-* Your cluster now appears greyed-out in the cluster page. Click *Start* button
-  to provision the cluster resources in Azure. ![CC Cluster
-  Prepared](images/cc-cluster-prepared.png)
-* Starting up the cluster for the first time takes about 10 mins for it to be
-  ready. By default, only the master (or head) node of the cluster is started.
-  Azure CycleCloud provisions all the necessary network and storage resources
-  needed by the master node, and also sets up the scheduling environment in the
-  cluster.
-* The master node status bar turns green when the cluster is ready to use ![CC
-  Cluster Ready](images/cc-cluster-ready.png)
-
-### 2.2 Connecting into the master node and submitting a LAMMPS job
-
-As part of the ARM deployment process in section 1 above, the SSH public key you
-provided is stored in the Azure CycleCloud application server and pushed into
-each cluster that you create. 
-
-As a result of that, you are able to use your SSH private key to log into the
+* On the front page, find the LAMMPS cluster icon and select it.
+  ![CC Cluster Wall](images/cc-cluster-wall.png)
+* Enter a name for the new cluster (e.g., "LammpsLabs").
+  ![CC New Cluster LAMMPS](images/cc-newcluster-laamps.png)
+* Click "Next" to navigate to the **Required Settings** section. 
+* For `Execute VM Type`, click on "Choose" and select a virtual machine type 
+  that you would like to use for execution nodes. We recommend "H16r" if
+  you have quota for these.
+* In the networking subnet dropdown, select the subnet in your resource group
+  which has "-compute" as a suffix. This subnet was created as part of the ARM deployment.
+  ![CC Cluster Required Settings](images/cc-cluster-required-settings.png)
+* Click "Next" to navigate to the **Advanced Settings** section. This section
+  allows you to configure the cluster to use a different operating system, set
+  up different projects, and attach a public IP address to the cluster nodes.
+  For the purpose of this lab, there is no need to change any of these settings.
+  ![CC Cluster Advanced Settings](images/cc-cluster-adv-settings.png)
+* Click the "Save" button on the bottom to create this cluster.
+* Click the "Start" button to provision the cluster resources in Azure.
+  ![CC Cluster Prepared](images/cc-cluster-prepared.png)
+  - Starting up the cluster for the first time takes about 10 minutess. By
+    default, only the master (or "head") node of the cluster is started. Azure
+    CycleCloud provisions all the necessary network and storage resources needed
+    by the master node, and also sets up the scheduling environment in the
+    cluster.
+* The master node status bar turns green when the cluster is ready to use. Wait
+  for the green status bar before proceeding to the next step.
+  ![CC Cluster Ready](images/cc-cluster-ready.png)
+
+### 2.2 Connecting to the master node and submitting a LAMMPS job
+
+The SSH public key you specified as part of the deployment is stored in the Azure CycleCloud application server and pushed into each cluster that you create. As a result, you can use your SSH private key to log into the
 master node.
 
-* Retrieve the public IP address of the cluster headnode by selecting the master
-  node in the cluster management pane, and then clicking on the connect button
-  that appears below.
-
-* The pop-up window shows the connection string you would use to connect to the
-   cluster. 
-
-![Connect Popup](images/connect-popup.png)
-
-
-* Use your SSH client to connect to the master node. You could also use the one
-   that is available in Cloud Shell:
-```
-ellen@Azure:~$ ssh ellen@40.114.123.148
-Last login: Thu Aug  2 20:55:34 2018 from 97-113-237-75.tukw.qwest.net
-
- __        __  |    ___       __  |    __         __|
-(___ (__| (___ |_, (__/_     (___ |_, (__) (__(_ (__|
-        |
-
-Cluster: LammpsLabs
-Version: 7.5.0
-Run List: recipe[cyclecloud], role[pbspro_master_role], recipe[cluster_init]
-[ellen@ip-0A000404 ~]$
-```
+* Click on the "Connect" button in the menubar for the bottom pane to open the
+  connection dialog.
+  ![Connect Popup](images/connect-popup.png)
+* Copy the highlighted SSH command.
+* Paste the command into your Cloud Shell session and press Enter:
+  ```
+  ellen@Azure:~$ ssh ellen@40.114.123.148
+  The authenticity of host '40.114.123.148 (40.114.123.148)' can't be established.
+  ECDSA key fingerprint is SHA256:lM8akvIqai+YLYAfpygu5wKmMH1W0YXfy+BoXAJhIow.
+  Are you sure you want to continue connecting (yes/no)? yes
+  Warning: Permanently added '40.114.123.148' (ECDSA) to the list of known hosts.
+  
+   __        __  |    ___       __  |    __         __|
+  (___ (__| (___ |_, (__/_     (___ |_, (__) (__(_ (__|
+          |
+  
+  Cluster: LammpsLabs
+  Version: 7.5.0
+  Run List: recipe[cyclecloud], role[pbspro_master_role], recipe[cluster_init]
+  [ellen@ip-0A000404 ~]$
+  ```
 
 * You can verify that the job queue is empty by using the `qstat` command:
-```
-[ellen@ip-0A000404 ~]$ qstat -Q
-Queue              Max   Tot Ena Str   Que   Run   Hld   Wat   Trn   Ext Type
----------------- ----- ----- --- --- ----- ----- ----- ----- ----- ----- ----
-workq                0     0 yes yes     0     0     0     0     0     0 Exec
-[ellen@ip-0A000404 ~]$
-```
-
-* Change to the demo directory, where you can find a sample LAMMPS job, and
-  submit the job using existing `runpi.sh` script.
-```
-[ellen@ip-0A000404 ~]$ cd demo/
-[ellen@ip-0A000404 demo]$ ./runpi.sh
-0[].ip-0A000404
-```
-
-* Note, if you're curious, you can view the contents of the `runpi.sh` script by
-  running the `cat` command. This script prepares a sample job which contains
+  ```
+  [ellen@ip-0A000404 ~]$ qstat -Q
+  Queue              Max   Tot Ena Str   Que   Run   Hld   Wat   Trn   Ext Type
+  ---------------- ----- ----- --- --- ----- ----- ----- ----- ----- ----- ----
+  workq                0     0 yes yes     0     0     0     0     0     0 Exec
+  [ellen@ip-0A000404 ~]$
+  ```
+
+* Change to the demo directory and submit the LAMMPS job using the existing `runpi.sh` script.
+  ```
+  [ellen@ip-0A000404 ~]$ cd demo/
+  [ellen@ip-0A000404 demo]$ ./runpi.sh
+  0[].ip-0A000404
+  [ellen@ip-0A000404 demo]$
+  ```
+
+* If you're curious, you can view the contents of the `runpi.sh` script by
+  running the `cat` command. This script prepares a sample job that contains
   1000 individual tasks, and submits that job using the `qsub` command.
-```
-[ellen@ip-0A000404 demo]$ cat runpi.sh
-#!/bin/bash
-mkdir -p /shared/scratch/pi
-cp ~/demo/pi.py /shared/scratch/pi
-cp ~/demo/pi.sh /shared/scratch/pi
-cd /shared/scratch/pi
-qsub -J 1-1000 /shared/scratch/pi/pi.sh
-```
+  ```
+  [ellen@ip-0A000404 demo]$ cat runpi.sh
+  #!/bin/bash
+  mkdir -p /shared/scratch/pi
+  cp ~/demo/pi.py /shared/scratch/pi
+  cp ~/demo/pi.sh /shared/scratch/pi
+  cd /shared/scratch/pi
+  qsub -J 1-1000 /shared/scratch/pi/pi.sh
+  ```
 
 * Verify that the job is now in the queue
-```
-[ellen@ip-0A000404 ~]$ qstat -Q
-Queue              Max   Tot Ena Str   Que   Run   Hld   Wat   Trn   Ext Type
----------------- ----- ----- --- --- ----- ----- ----- ----- ----- ----- ----
-workq                0     1 yes yes     1     0     0     0     0     0 Exec
-[ellen@ip-0A000404 ~]$
-```
-
-* The autoscaling hook in the PBS scheduler picks up the job and submits a
+  ```
+  [ellen@ip-0A000404 ~]$ qstat -Q
+  Queue              Max   Tot Ena Str   Que   Run   Hld   Wat   Trn   Ext Type
+  ---------------- ----- ----- --- --- ----- ----- ----- ----- ----- ----- ----
+  workq                0     1 yes yes     1     0     0     0     0     0 Exec
+  [ellen@ip-0A000404 ~]$
+  ```
+
+* The autoscaling hook in the PBS scheduler detects the job and submits a
   resource request to the Azure CycleCloud server. You will see nodes being
   provisioned in the Azure CycleCloud UI within a minute. ![CC Allocating
   Nodes](images/cc-allocating-nodes.ong.png) Note that CycleCloud will not
@@ -258,14 +256,21 @@ workq                0     1 yes yes     1     0     0     0     0     0 Exec
   not start executing until every VM associated with the jobs is ready.
 
 * Verify that the job is complete by running the `qstat -Q` command
-  periodically. The jobs should finish quickly, in a minute or two. 
-```
-[ellen@ip-0A000404 demo]$ qstat -Q
-Queue              Max   Tot Ena Str   Que   Run   Hld   Wat   Trn   Ext Type
----------------- ----- ----- --- --- ----- ----- ----- ----- ----- ----- ----
-workq                0     0 yes yes     0     0     0     0     0     0 Exec
-```
+  periodically. The Queued column (`Que`) should be 0, indicating that no more
+  jobs are awaiting execution. For the above submission, jobs typically finish
+  in a minute or two. 
+  ```
+  [ellen@ip-0A000404 demo]$ qstat -Q
+  Queue              Max   Tot Ena Str   Que   Run   Hld   Wat   Trn   Ext Type
+  ---------------- ----- ----- --- --- ----- ----- ----- ----- ----- ----- ----
+  workq                0     0 yes yes     0     0     0     0     0     0 Exec
+  ```
 
 * With no more jobs in the queue, the execute nodes will start auto-stopping,
   and your cluster will return to just having the master node.
 
+**Congratulations! You have completed the Lab 1 tutorial.**
+* Continue to [Lab 2 - Customizing an HPC cluster template](/Lab2/Tutorial.md),
+  or
+* Click on the "Terminate" button to terminate the cluster until you need it
+  again.