Skip to content

Commit bfa5aef

Browse files
committed
Updates to Lab1
1 parent 6d9fe90 commit bfa5aef

File tree

1 file changed

+138
-133
lines changed

1 file changed

+138
-133
lines changed

Lab1/Tutorial.md

Lines changed: 138 additions & 133 deletions
Original file line numberDiff line numberDiff line change
@@ -3,8 +3,8 @@
33
This lab focuses on helping you become familiar with Azure CycleCloud, a tool
44
for orchestrating HPC clusters in Azure.
55

6-
Please send questions or comments to the Azure CycleCloud PM team -
7-
<mailto:askcyclecloud @ microsoft.com>
6+
Please send questions or comments to the Azure CycleCloud PM team:
7+
askcyclecloud AT microsoft.com.
88

99
## Goals
1010
* Create a fully functional, configured Azure CycleCloud instance that can be
@@ -22,11 +22,13 @@ Please send questions or comments to the Azure CycleCloud PM team -
2222
There are several ways to install and setup an Azure CycleCloud server. For this
2323
lab, you'll be deploying Azure CycleCloud onto a VM via an ARM template.
2424

25-
As you follow the steps below, *please keep track of the following*: 1. The
26-
domain name (FQDN) of your Azure CycleCloud. 2. The `username` used in the
27-
ARM template. If you followed the QSG, the same `username` should be used in
28-
the Azure CycleCloud web UI. 3. The `password` created in the Azure
29-
CycleCloud web UI for your user.
25+
As you follow the steps below, *please keep track of the following*:
26+
27+
1. The domain name (FQDN) of your Azure CycleCloud.
28+
2. The `username` used in the ARM template. If you followed the QSG, the same
29+
`username` should be used in the Azure CycleCloud web UI.
30+
3. The `password` created in the Azure
31+
CycleCloud web UI for your user.
3032

3133

3234
### 1.1 Log into https://shell.azure.com
@@ -78,6 +80,7 @@ ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAACAQCwIvmC4K/0BUwOBqCsPxw5Ht8qWyDkorrU+gc6cJbo
7880
ellen@Azure:~$
7981
```
8082
### 1.5 Create a service principal
83+
Substitute a custom value for `cyclecloudlabs` as the name of your service principal. The name must be unique across all of Azure.
8184
```
8285
ellen@Azure:~$ az ad sp create-for-rbac --name cyclecloudlabs
8386
{
@@ -88,7 +91,7 @@ ellen@Azure:~$ az ad sp create-for-rbac --name cyclecloudlabs
8891
"tenant": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
8992
}
9093
```
91-
_Save this service principal somewhere. You will need it in the section below as well as in future tutorials in this lab._
94+
_Save this output somewhere. You will need it in the section below as well as in future tutorials in this lab._
9295

9396
### 1.6 Deploy Azure CycleCloud
9497
[![Deploy to
@@ -99,151 +102,146 @@ Azure](https://azuredeploy.net/deploybutton.svg)](https://portal.azure.com/#crea
99102

100103
Enter the required information:
101104

105+
* *Resource Group*: Enter any custom name
102106
* *Tenant ID*: The `tenant` listed above in the service principal
103107
* *Application ID*: `appId` of the service principal
104108
* *Application Secret*: `password` of the service principal
105109
* *SSH Public Key*: Copy and paste here the output of step 1.4
106110
* *Username*: The output of step 1.2 (e.g. *johnsmith* instead of
107111
*johnsmith@domain.com*)
112+
* All other fields can be left unedited
108113

114+
After you accept the Terms & Conditions, press the "Purchase" button to begin deployment.
109115
The deployment process runs an installation script as a custom script extension,
110116
which installs and sets up CycleCloud. This process takes between 5 and 8 mins.
111117

112118
### 1.7 Retrieve the Domain Name (FQDN) of the Azure CycleCloud VM
113-
* When the deployment is completed you can retrieve the fully qualified domain
114-
name of the Azure CycleCloud VM from the Azure portal or using the CLI in
115-
Cloud Shell: ![Deployment Output](images/deployment-output.png)
119+
When the deployment is completed you can retrieve the fully qualified domain
120+
name of the Azure CycleCloud VM from the Outputs tab in the Azure portal: ![Deployment Output](images/deployment-output.png)
116121

122+
Or using Cloud Shell (replace `MyResourceGroup` with the value you used):
117123
```
118-
# replace ResourceGroupName with the one that you used
119-
ellen@Azure:~$ az group deployment list -g ${ResourceGroupName} --query "[0].properties.outputs.fqdn.value"
124+
ellen@Azure:~$ az group deployment list -g MyResourceGroup --query "[0].properties.outputs.fqdn.value"
120125
"cyclecloud43vgp4.eastus.cloudapp.azure.com"
126+
ellen@Azure:~$
121127
```
122128

123129
### 1.8 Logging into the Azure CycleCloud server for the first time
124-
* In your web browser go to https://fqdn (where fqdn is the addressed retrieved
125-
above).
126-
* You will be asked to enter a site name in the first screen, pick any name you
127-
like
128-
* Accepting the EULA on the second screen brings you to a page that asks you to
129-
create a admin user.
130-
- Use the same `username` used above in step 1.5. Remember that this is also
131-
the username of your Cloud Shell session.
132-
- Choose a password that meets the minimum requirements. ![First
133-
Login](images/cc-first-login.png) ![Create
134-
Account](images/cc-create-account.png)
135-
136-
## 2. Starting an auto-scaling HPC cluster
137-
In this section, you will start a cluster using PBS-Pro as a scheduler, with
138-
LAMMPS as a solver
130+
* In your web browser go to `https://{FQDN}` (where FQDN is the address retrieved
131+
in the previous step). The installation uses a self-signed SSL certificate,
132+
which may show up with a warning in your browser.
133+
* The first screen will prompt you to enter a site name; enter any name you
134+
like and then click "Next".![First Login](images/cc-first-login.png)
135+
* Accept the End-User License Agreement and click "Next".
136+
* Create an admin user:
137+
- For the User ID, use the same `username` used above in step 1.5. Remember
138+
that this is also the username of your Cloud Shell session.
139+
- Enter a name for the user.
140+
- Enter and confirm a new password that meets the minimum requirements.
141+
142+
![Create Account](images/cc-create-account.png)
143+
144+
## 2. Starting an auto-scaling HPC cluster
145+
In this section, you will start a cluster using PBS Pro as a scheduler and
146+
LAMMPS as a solver.
139147

140148
### 2.1 Start a new LAMMPS cluster in Azure CycleCloud
141149

142-
If you do not have a cluster that is already running, the default start page of
143-
Azure CycleCloud will display a wall of applications and cluster types that are
144-
distributed with each install
145-
* Find the LAMMPS cluster icon and select it. ![CC Cluster
146-
Wall](images/cc-cluster-wall.png)
147-
* Provide a name for the new cluster and move on to the *Required Settings*
148-
section. ![CC New Cluster LAMMPS](images/cc-newcluster-laamps.png)
149-
* Select a VM type that you should like to use as a `Execute VM Type`, we
150-
recommend the H16r if you have quota for these.
151-
* In the networking subnet dropdown, select the subnet which has "-compute" as a
152-
suffix. This subnet was created as part of the ARM deployment. ![CC Cluster
153-
Required Settings](images/cc-cluster-required-settings.png)
154-
* The *Advanced Settings* section allows you to configure the cluster to use a
155-
different OS, set up different projects, as well as to attach a public IP
156-
address to the cluster nodes. There is no need to change any settings here for
157-
the purposes of this lab. ![CC Cluster Advanced
158-
Settings](images/cc-cluster-adv-settings.png)
159-
* Click the *Save* button on the bottom right-hand corner of the page to create
160-
and save this cluster.
161-
* Your cluster now appears greyed-out in the cluster page. Click *Start* button
162-
to provision the cluster resources in Azure. ![CC Cluster
163-
Prepared](images/cc-cluster-prepared.png)
164-
* Starting up the cluster for the first time takes about 10 mins for it to be
165-
ready. By default, only the master (or head) node of the cluster is started.
166-
Azure CycleCloud provisions all the necessary network and storage resources
167-
needed by the master node, and also sets up the scheduling environment in the
168-
cluster.
169-
* The master node status bar turns green when the cluster is ready to use ![CC
170-
Cluster Ready](images/cc-cluster-ready.png)
171-
172-
### 2.2 Connecting into the master node and submitting a LAMMPS job
173-
174-
As part of the ARM deployment process in section 1 above, the SSH public key you
175-
provided is stored in the Azure CycleCloud application server and pushed into
176-
each cluster that you create.
177-
178-
As a result of that, you are able to use your SSH private key to log into the
150+
* On the front page, find the LAMMPS cluster icon and select it.
151+
![CC Cluster Wall](images/cc-cluster-wall.png)
152+
* Enter a name for the new cluster (e.g., "LammpsLabs").
153+
![CC New Cluster LAMMPS](images/cc-newcluster-laamps.png)
154+
* Click "Next" to navigate to the **Required Settings** section.
155+
* For `Execute VM Type`, click on "Choose" and select a virtual machine type
156+
that you would like to use for execution nodes. We recommend "H16r" if
157+
you have quota for these.
158+
* In the networking subnet dropdown, select the subnet in your resource group
159+
which has "-compute" as a suffix. This subnet was created as part of the ARM deployment.
160+
![CC Cluster Required Settings](images/cc-cluster-required-settings.png)
161+
* Click "Next" to navigate to the **Advanced Settings** section. This section
162+
allows you to configure the cluster to use a different operating system, set
163+
up different projects, and attach a public IP address to the cluster nodes.
164+
For the purpose of this lab, there is no need to change any of these settings.
165+
![CC Cluster Advanced Settings](images/cc-cluster-adv-settings.png)
166+
* Click the "Save" button on the bottom to create this cluster.
167+
* Click the "Start" button to provision the cluster resources in Azure.
168+
![CC Cluster Prepared](images/cc-cluster-prepared.png)
169+
- Starting up the cluster for the first time takes about 10 minutess. By
170+
default, only the master (or "head") node of the cluster is started. Azure
171+
CycleCloud provisions all the necessary network and storage resources needed
172+
by the master node, and also sets up the scheduling environment in the
173+
cluster.
174+
* The master node status bar turns green when the cluster is ready to use. Wait
175+
for the green status bar before proceeding to the next step.
176+
![CC Cluster Ready](images/cc-cluster-ready.png)
177+
178+
### 2.2 Connecting to the master node and submitting a LAMMPS job
179+
180+
The SSH public key you specified as part of the deployment is stored in the Azure CycleCloud application server and pushed into each cluster that you create. As a result, you can use your SSH private key to log into the
179181
master node.
180182

181-
* Retrieve the public IP address of the cluster headnode by selecting the master
182-
node in the cluster management pane, and then clicking on the connect button
183-
that appears below.
184-
185-
* The pop-up window shows the connection string you would use to connect to the
186-
cluster.
187-
188-
![Connect Popup](images/connect-popup.png)
189-
190-
191-
* Use your SSH client to connect to the master node. You could also use the one
192-
that is available in Cloud Shell:
193-
```
194-
ellen@Azure:~$ ssh ellen@40.114.123.148
195-
Last login: Thu Aug 2 20:55:34 2018 from 97-113-237-75.tukw.qwest.net
196-
197-
__ __ | ___ __ | __ __|
198-
(___ (__| (___ |_, (__/_ (___ |_, (__) (__(_ (__|
199-
|
200-
201-
Cluster: LammpsLabs
202-
Version: 7.5.0
203-
Run List: recipe[cyclecloud], role[pbspro_master_role], recipe[cluster_init]
204-
[ellen@ip-0A000404 ~]$
205-
```
183+
* Click on the "Connect" button in the menubar for the bottom pane to open the
184+
connection dialog.
185+
![Connect Popup](images/connect-popup.png)
186+
* Copy the highlighted SSH command.
187+
* Paste the command into your Cloud Shell session and press Enter:
188+
```
189+
ellen@Azure:~$ ssh ellen@40.114.123.148
190+
The authenticity of host '40.114.123.148 (40.114.123.148)' can't be established.
191+
ECDSA key fingerprint is SHA256:lM8akvIqai+YLYAfpygu5wKmMH1W0YXfy+BoXAJhIow.
192+
Are you sure you want to continue connecting (yes/no)? yes
193+
Warning: Permanently added '40.114.123.148' (ECDSA) to the list of known hosts.
194+
195+
__ __ | ___ __ | __ __|
196+
(___ (__| (___ |_, (__/_ (___ |_, (__) (__(_ (__|
197+
|
198+
199+
Cluster: LammpsLabs
200+
Version: 7.5.0
201+
Run List: recipe[cyclecloud], role[pbspro_master_role], recipe[cluster_init]
202+
[ellen@ip-0A000404 ~]$
203+
```
206204

207205
* You can verify that the job queue is empty by using the `qstat` command:
208-
```
209-
[ellen@ip-0A000404 ~]$ qstat -Q
210-
Queue Max Tot Ena Str Que Run Hld Wat Trn Ext Type
211-
---------------- ----- ----- --- --- ----- ----- ----- ----- ----- ----- ----
212-
workq 0 0 yes yes 0 0 0 0 0 0 Exec
213-
[ellen@ip-0A000404 ~]$
214-
```
215-
216-
* Change to the demo directory, where you can find a sample LAMMPS job, and
217-
submit the job using existing `runpi.sh` script.
218-
```
219-
[ellen@ip-0A000404 ~]$ cd demo/
220-
[ellen@ip-0A000404 demo]$ ./runpi.sh
221-
0[].ip-0A000404
222-
```
223-
224-
* Note, if you're curious, you can view the contents of the `runpi.sh` script by
225-
running the `cat` command. This script prepares a sample job which contains
206+
```
207+
[ellen@ip-0A000404 ~]$ qstat -Q
208+
Queue Max Tot Ena Str Que Run Hld Wat Trn Ext Type
209+
---------------- ----- ----- --- --- ----- ----- ----- ----- ----- ----- ----
210+
workq 0 0 yes yes 0 0 0 0 0 0 Exec
211+
[ellen@ip-0A000404 ~]$
212+
```
213+
214+
* Change to the demo directory and submit the LAMMPS job using the existing `runpi.sh` script.
215+
```
216+
[ellen@ip-0A000404 ~]$ cd demo/
217+
[ellen@ip-0A000404 demo]$ ./runpi.sh
218+
0[].ip-0A000404
219+
[ellen@ip-0A000404 demo]$
220+
```
221+
222+
* If you're curious, you can view the contents of the `runpi.sh` script by
223+
running the `cat` command. This script prepares a sample job that contains
226224
1000 individual tasks, and submits that job using the `qsub` command.
227-
```
228-
[ellen@ip-0A000404 demo]$ cat runpi.sh
229-
#!/bin/bash
230-
mkdir -p /shared/scratch/pi
231-
cp ~/demo/pi.py /shared/scratch/pi
232-
cp ~/demo/pi.sh /shared/scratch/pi
233-
cd /shared/scratch/pi
234-
qsub -J 1-1000 /shared/scratch/pi/pi.sh
235-
```
225+
```
226+
[ellen@ip-0A000404 demo]$ cat runpi.sh
227+
#!/bin/bash
228+
mkdir -p /shared/scratch/pi
229+
cp ~/demo/pi.py /shared/scratch/pi
230+
cp ~/demo/pi.sh /shared/scratch/pi
231+
cd /shared/scratch/pi
232+
qsub -J 1-1000 /shared/scratch/pi/pi.sh
233+
```
236234

237235
* Verify that the job is now in the queue
238-
```
239-
[ellen@ip-0A000404 ~]$ qstat -Q
240-
Queue Max Tot Ena Str Que Run Hld Wat Trn Ext Type
241-
---------------- ----- ----- --- --- ----- ----- ----- ----- ----- ----- ----
242-
workq 0 1 yes yes 1 0 0 0 0 0 Exec
243-
[ellen@ip-0A000404 ~]$
244-
```
245-
246-
* The autoscaling hook in the PBS scheduler picks up the job and submits a
236+
```
237+
[ellen@ip-0A000404 ~]$ qstat -Q
238+
Queue Max Tot Ena Str Que Run Hld Wat Trn Ext Type
239+
---------------- ----- ----- --- --- ----- ----- ----- ----- ----- ----- ----
240+
workq 0 1 yes yes 1 0 0 0 0 0 Exec
241+
[ellen@ip-0A000404 ~]$
242+
```
243+
244+
* The autoscaling hook in the PBS scheduler detects the job and submits a
247245
resource request to the Azure CycleCloud server. You will see nodes being
248246
provisioned in the Azure CycleCloud UI within a minute. ![CC Allocating
249247
Nodes](images/cc-allocating-nodes.ong.png) Note that CycleCloud will not
@@ -258,14 +256,21 @@ workq 0 1 yes yes 1 0 0 0 0 0 Exec
258256
not start executing until every VM associated with the jobs is ready.
259257

260258
* Verify that the job is complete by running the `qstat -Q` command
261-
periodically. The jobs should finish quickly, in a minute or two.
262-
```
263-
[ellen@ip-0A000404 demo]$ qstat -Q
264-
Queue Max Tot Ena Str Que Run Hld Wat Trn Ext Type
265-
---------------- ----- ----- --- --- ----- ----- ----- ----- ----- ----- ----
266-
workq 0 0 yes yes 0 0 0 0 0 0 Exec
267-
```
259+
periodically. The Queued column (`Que`) should be 0, indicating that no more
260+
jobs are awaiting execution. For the above submission, jobs typically finish
261+
in a minute or two.
262+
```
263+
[ellen@ip-0A000404 demo]$ qstat -Q
264+
Queue Max Tot Ena Str Que Run Hld Wat Trn Ext Type
265+
---------------- ----- ----- --- --- ----- ----- ----- ----- ----- ----- ----
266+
workq 0 0 yes yes 0 0 0 0 0 0 Exec
267+
```
268268

269269
* With no more jobs in the queue, the execute nodes will start auto-stopping,
270270
and your cluster will return to just having the master node.
271271

272+
**Congratulations! You have completed the Lab 1 tutorial.**
273+
* Continue to [Lab 2 - Customizing an HPC cluster template](/Lab2/Tutorial.md),
274+
or
275+
* Click on the "Terminate" button to terminate the cluster until you need it
276+
again.

0 commit comments

Comments
 (0)