3
3
This lab focuses on helping you become familiar with Azure CycleCloud, a tool
4
4
for orchestrating HPC clusters in Azure.
5
5
6
- Please send questions or comments to the Azure CycleCloud PM team -
7
- <mailto : askcyclecloud @ microsoft.com>
6
+ Please send questions or comments to the Azure CycleCloud PM team:
7
+ askcyclecloud AT microsoft.com.
8
8
9
9
## Goals
10
10
* Create a fully functional, configured Azure CycleCloud instance that can be
@@ -22,11 +22,13 @@ Please send questions or comments to the Azure CycleCloud PM team -
22
22
There are several ways to install and setup an Azure CycleCloud server. For this
23
23
lab, you'll be deploying Azure CycleCloud onto a VM via an ARM template.
24
24
25
- As you follow the steps below, * please keep track of the following* : 1. The
26
- domain name (FQDN) of your Azure CycleCloud. 2. The ` username ` used in the
27
- ARM template. If you followed the QSG, the same ` username ` should be used in
28
- the Azure CycleCloud web UI. 3. The ` password ` created in the Azure
29
- CycleCloud web UI for your user.
25
+ As you follow the steps below, * please keep track of the following* :
26
+
27
+ 1 . The domain name (FQDN) of your Azure CycleCloud.
28
+ 2 . The ` username ` used in the ARM template. If you followed the QSG, the same
29
+ ` username ` should be used in the Azure CycleCloud web UI.
30
+ 3 . The ` password ` created in the Azure
31
+ CycleCloud web UI for your user.
30
32
31
33
32
34
### 1.1 Log into https://shell.azure.com
@@ -78,6 +80,7 @@ ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAACAQCwIvmC4K/0BUwOBqCsPxw5Ht8qWyDkorrU+gc6cJbo
78
80
ellen@Azure:~$
79
81
```
80
82
### 1.5 Create a service principal
83
+ Substitute a custom value for ` cyclecloudlabs ` as the name of your service principal. The name must be unique across all of Azure.
81
84
```
82
85
ellen@Azure:~$ az ad sp create-for-rbac --name cyclecloudlabs
83
86
{
@@ -88,7 +91,7 @@ ellen@Azure:~$ az ad sp create-for-rbac --name cyclecloudlabs
88
91
"tenant": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
89
92
}
90
93
```
91
- _ Save this service principal somewhere. You will need it in the section below as well as in future tutorials in this lab._
94
+ _ Save this output somewhere. You will need it in the section below as well as in future tutorials in this lab._
92
95
93
96
### 1.6 Deploy Azure CycleCloud
94
97
[ ](https://portal.azure.com/#crea
99
102
100
103
Enter the required information:
101
104
105
+ * * Resource Group* : Enter any custom name
102
106
* * Tenant ID* : The ` tenant ` listed above in the service principal
103
107
* * Application ID* : ` appId ` of the service principal
104
108
* * Application Secret* : ` password ` of the service principal
105
109
* * SSH Public Key* : Copy and paste here the output of step 1.4
106
110
* * Username* : The output of step 1.2 (e.g. * johnsmith* instead of
107
111
* johnsmith@domain.com * )
112
+ * All other fields can be left unedited
108
113
114
+ After you accept the Terms & Conditions, press the "Purchase" button to begin deployment.
109
115
The deployment process runs an installation script as a custom script extension,
110
116
which installs and sets up CycleCloud. This process takes between 5 and 8 mins.
111
117
112
118
### 1.7 Retrieve the Domain Name (FQDN) of the Azure CycleCloud VM
113
- * When the deployment is completed you can retrieve the fully qualified domain
114
- name of the Azure CycleCloud VM from the Azure portal or using the CLI in
115
- Cloud Shell: ![ Deployment Output] ( images/deployment-output.png )
119
+ When the deployment is completed you can retrieve the fully qualified domain
120
+ name of the Azure CycleCloud VM from the Outputs tab in the Azure portal: ![ Deployment Output] ( images/deployment-output.png )
116
121
122
+ Or using Cloud Shell (replace ` MyResourceGroup ` with the value you used):
117
123
```
118
- # replace ResourceGroupName with the one that you used
119
- ellen@Azure:~$ az group deployment list -g ${ResourceGroupName} --query "[0].properties.outputs.fqdn.value"
124
+ ellen@Azure:~$ az group deployment list -g MyResourceGroup --query "[0].properties.outputs.fqdn.value"
120
125
"cyclecloud43vgp4.eastus.cloudapp.azure.com"
126
+ ellen@Azure:~$
121
127
```
122
128
123
129
### 1.8 Logging into the Azure CycleCloud server for the first time
124
- * In your web browser go to https://fqdn (where fqdn is the addressed retrieved
125
- above).
126
- * You will be asked to enter a site name in the first screen, pick any name you
127
- like
128
- * Accepting the EULA on the second screen brings you to a page that asks you to
129
- create a admin user.
130
- - Use the same ` username ` used above in step 1.5. Remember that this is also
131
- the username of your Cloud Shell session.
132
- - Choose a password that meets the minimum requirements. ![ First
133
- Login] ( images/cc-first-login.png ) ![ Create
134
- Account] ( images/cc-create-account.png )
135
-
136
- ## 2. Starting an auto-scaling HPC cluster
137
- In this section, you will start a cluster using PBS-Pro as a scheduler, with
138
- LAMMPS as a solver
130
+ * In your web browser go to ` https://{FQDN} ` (where FQDN is the address retrieved
131
+ in the previous step). The installation uses a self-signed SSL certificate,
132
+ which may show up with a warning in your browser.
133
+ * The first screen will prompt you to enter a site name; enter any name you
134
+ like and then click "Next".![ First Login] ( images/cc-first-login.png )
135
+ * Accept the End-User License Agreement and click "Next".
136
+ * Create an admin user:
137
+ - For the User ID, use the same ` username ` used above in step 1.5. Remember
138
+ that this is also the username of your Cloud Shell session.
139
+ - Enter a name for the user.
140
+ - Enter and confirm a new password that meets the minimum requirements.
141
+
142
+ ![ Create Account] ( images/cc-create-account.png )
143
+
144
+ ## 2. Starting an auto-scaling HPC cluster
145
+ In this section, you will start a cluster using PBS Pro as a scheduler and
146
+ LAMMPS as a solver.
139
147
140
148
### 2.1 Start a new LAMMPS cluster in Azure CycleCloud
141
149
142
- If you do not have a cluster that is already running, the default start page of
143
- Azure CycleCloud will display a wall of applications and cluster types that are
144
- distributed with each install
145
- * Find the LAMMPS cluster icon and select it. ![ CC Cluster
146
- Wall] ( images/cc-cluster-wall.png )
147
- * Provide a name for the new cluster and move on to the * Required Settings*
148
- section. ![ CC New Cluster LAMMPS] ( images/cc-newcluster-laamps.png )
149
- * Select a VM type that you should like to use as a ` Execute VM Type ` , we
150
- recommend the H16r if you have quota for these.
151
- * In the networking subnet dropdown, select the subnet which has "-compute" as a
152
- suffix. This subnet was created as part of the ARM deployment. ![ CC Cluster
153
- Required Settings] ( images/cc-cluster-required-settings.png )
154
- * The * Advanced Settings* section allows you to configure the cluster to use a
155
- different OS, set up different projects, as well as to attach a public IP
156
- address to the cluster nodes. There is no need to change any settings here for
157
- the purposes of this lab. ![ CC Cluster Advanced
158
- Settings] ( images/cc-cluster-adv-settings.png )
159
- * Click the * Save* button on the bottom right-hand corner of the page to create
160
- and save this cluster.
161
- * Your cluster now appears greyed-out in the cluster page. Click * Start* button
162
- to provision the cluster resources in Azure. ![ CC Cluster
163
- Prepared] ( images/cc-cluster-prepared.png )
164
- * Starting up the cluster for the first time takes about 10 mins for it to be
165
- ready. By default, only the master (or head) node of the cluster is started.
166
- Azure CycleCloud provisions all the necessary network and storage resources
167
- needed by the master node, and also sets up the scheduling environment in the
168
- cluster.
169
- * The master node status bar turns green when the cluster is ready to use ![ CC
170
- Cluster Ready] ( images/cc-cluster-ready.png )
171
-
172
- ### 2.2 Connecting into the master node and submitting a LAMMPS job
173
-
174
- As part of the ARM deployment process in section 1 above, the SSH public key you
175
- provided is stored in the Azure CycleCloud application server and pushed into
176
- each cluster that you create.
177
-
178
- As a result of that, you are able to use your SSH private key to log into the
150
+ * On the front page, find the LAMMPS cluster icon and select it.
151
+ ![ CC Cluster Wall] ( images/cc-cluster-wall.png )
152
+ * Enter a name for the new cluster (e.g., "LammpsLabs").
153
+ ![ CC New Cluster LAMMPS] ( images/cc-newcluster-laamps.png )
154
+ * Click "Next" to navigate to the ** Required Settings** section.
155
+ * For ` Execute VM Type ` , click on "Choose" and select a virtual machine type
156
+ that you would like to use for execution nodes. We recommend "H16r" if
157
+ you have quota for these.
158
+ * In the networking subnet dropdown, select the subnet in your resource group
159
+ which has "-compute" as a suffix. This subnet was created as part of the ARM deployment.
160
+ ![ CC Cluster Required Settings] ( images/cc-cluster-required-settings.png )
161
+ * Click "Next" to navigate to the ** Advanced Settings** section. This section
162
+ allows you to configure the cluster to use a different operating system, set
163
+ up different projects, and attach a public IP address to the cluster nodes.
164
+ For the purpose of this lab, there is no need to change any of these settings.
165
+ ![ CC Cluster Advanced Settings] ( images/cc-cluster-adv-settings.png )
166
+ * Click the "Save" button on the bottom to create this cluster.
167
+ * Click the "Start" button to provision the cluster resources in Azure.
168
+ ![ CC Cluster Prepared] ( images/cc-cluster-prepared.png )
169
+ - Starting up the cluster for the first time takes about 10 minutess. By
170
+ default, only the master (or "head") node of the cluster is started. Azure
171
+ CycleCloud provisions all the necessary network and storage resources needed
172
+ by the master node, and also sets up the scheduling environment in the
173
+ cluster.
174
+ * The master node status bar turns green when the cluster is ready to use. Wait
175
+ for the green status bar before proceeding to the next step.
176
+ ![ CC Cluster Ready] ( images/cc-cluster-ready.png )
177
+
178
+ ### 2.2 Connecting to the master node and submitting a LAMMPS job
179
+
180
+ The SSH public key you specified as part of the deployment is stored in the Azure CycleCloud application server and pushed into each cluster that you create. As a result, you can use your SSH private key to log into the
179
181
master node.
180
182
181
- * Retrieve the public IP address of the cluster headnode by selecting the master
182
- node in the cluster management pane, and then clicking on the connect button
183
- that appears below.
184
-
185
- * The pop-up window shows the connection string you would use to connect to the
186
- cluster.
187
-
188
- ![ Connect Popup] ( images/connect-popup.png )
189
-
190
-
191
- * Use your SSH client to connect to the master node. You could also use the one
192
- that is available in Cloud Shell:
193
- ```
194
- ellen@Azure:~$ ssh ellen@40.114.123.148
195
- Last login: Thu Aug 2 20:55:34 2018 from 97-113-237-75.tukw.qwest.net
196
-
197
- __ __ | ___ __ | __ __|
198
- (___ (__| (___ |_, (__/_ (___ |_, (__) (__(_ (__|
199
- |
200
-
201
- Cluster: LammpsLabs
202
- Version: 7.5.0
203
- Run List: recipe[cyclecloud], role[pbspro_master_role], recipe[cluster_init]
204
- [ellen@ip-0A000404 ~]$
205
- ```
183
+ * Click on the "Connect" button in the menubar for the bottom pane to open the
184
+ connection dialog.
185
+ ![ Connect Popup] ( images/connect-popup.png )
186
+ * Copy the highlighted SSH command.
187
+ * Paste the command into your Cloud Shell session and press Enter:
188
+ ```
189
+ ellen@Azure:~$ ssh ellen@40.114.123.148
190
+ The authenticity of host '40.114.123.148 (40.114.123.148)' can't be established.
191
+ ECDSA key fingerprint is SHA256:lM8akvIqai+YLYAfpygu5wKmMH1W0YXfy+BoXAJhIow.
192
+ Are you sure you want to continue connecting (yes/no)? yes
193
+ Warning: Permanently added '40.114.123.148' (ECDSA) to the list of known hosts.
194
+
195
+ __ __ | ___ __ | __ __|
196
+ (___ (__| (___ |_, (__/_ (___ |_, (__) (__(_ (__|
197
+ |
198
+
199
+ Cluster: LammpsLabs
200
+ Version: 7.5.0
201
+ Run List: recipe[cyclecloud], role[pbspro_master_role], recipe[cluster_init]
202
+ [ellen@ip-0A000404 ~]$
203
+ ```
206
204
207
205
* You can verify that the job queue is empty by using the ` qstat ` command:
208
- ```
209
- [ellen@ip-0A000404 ~]$ qstat -Q
210
- Queue Max Tot Ena Str Que Run Hld Wat Trn Ext Type
211
- ---------------- ----- ----- --- --- ----- ----- ----- ----- ----- ----- ----
212
- workq 0 0 yes yes 0 0 0 0 0 0 Exec
213
- [ellen@ip-0A000404 ~]$
214
- ```
215
-
216
- * Change to the demo directory, where you can find a sample LAMMPS job, and
217
- submit the job using existing ` runpi.sh ` script.
218
- ```
219
- [ellen@ip-0A000404 ~ ]$ cd demo/
220
- [ellen@ ip-0A000404 demo]$ ./runpi.sh
221
- 0[]. ip-0A000404
222
- ```
223
-
224
- * Note, if you're curious, you can view the contents of the ` runpi.sh ` script by
225
- running the ` cat ` command. This script prepares a sample job which contains
206
+ ```
207
+ [ellen@ip-0A000404 ~]$ qstat -Q
208
+ Queue Max Tot Ena Str Que Run Hld Wat Trn Ext Type
209
+ ---------------- ----- ----- --- --- ----- ----- ----- ----- ----- ----- ----
210
+ workq 0 0 yes yes 0 0 0 0 0 0 Exec
211
+ [ellen@ip-0A000404 ~]$
212
+ ```
213
+
214
+ * Change to the demo directory and submit the LAMMPS job using the existing ` runpi.sh ` script.
215
+ ```
216
+ [ellen@ip-0A000404 ~]$ cd demo/
217
+ [ellen@ip-0A000404 demo ]$ ./runpi.sh
218
+ 0[]. ip-0A000404
219
+ [ellen@ ip-0A000404 demo]$
220
+ ```
221
+
222
+ * If you're curious, you can view the contents of the ` runpi.sh ` script by
223
+ running the ` cat ` command. This script prepares a sample job that contains
226
224
1000 individual tasks, and submits that job using the ` qsub ` command.
227
- ```
228
- [ellen@ip-0A000404 demo]$ cat runpi.sh
229
- #!/bin/bash
230
- mkdir -p /shared/scratch/pi
231
- cp ~/demo/pi.py /shared/scratch/pi
232
- cp ~/demo/pi.sh /shared/scratch/pi
233
- cd /shared/scratch/pi
234
- qsub -J 1-1000 /shared/scratch/pi/pi.sh
235
- ```
225
+ ```
226
+ [ellen@ip-0A000404 demo]$ cat runpi.sh
227
+ #!/bin/bash
228
+ mkdir -p /shared/scratch/pi
229
+ cp ~/demo/pi.py /shared/scratch/pi
230
+ cp ~/demo/pi.sh /shared/scratch/pi
231
+ cd /shared/scratch/pi
232
+ qsub -J 1-1000 /shared/scratch/pi/pi.sh
233
+ ```
236
234
237
235
* Verify that the job is now in the queue
238
- ```
239
- [ellen@ip-0A000404 ~]$ qstat -Q
240
- Queue Max Tot Ena Str Que Run Hld Wat Trn Ext Type
241
- ---------------- ----- ----- --- --- ----- ----- ----- ----- ----- ----- ----
242
- workq 0 1 yes yes 1 0 0 0 0 0 Exec
243
- [ellen@ip-0A000404 ~]$
244
- ```
245
-
246
- * The autoscaling hook in the PBS scheduler picks up the job and submits a
236
+ ```
237
+ [ellen@ip-0A000404 ~]$ qstat -Q
238
+ Queue Max Tot Ena Str Que Run Hld Wat Trn Ext Type
239
+ ---------------- ----- ----- --- --- ----- ----- ----- ----- ----- ----- ----
240
+ workq 0 1 yes yes 1 0 0 0 0 0 Exec
241
+ [ellen@ip-0A000404 ~]$
242
+ ```
243
+
244
+ * The autoscaling hook in the PBS scheduler detects the job and submits a
247
245
resource request to the Azure CycleCloud server. You will see nodes being
248
246
provisioned in the Azure CycleCloud UI within a minute. ![ CC Allocating
249
247
Nodes] ( images/cc-allocating-nodes.ong.png ) Note that CycleCloud will not
@@ -258,14 +256,21 @@ workq 0 1 yes yes 1 0 0 0 0 0 Exec
258
256
not start executing until every VM associated with the jobs is ready.
259
257
260
258
* Verify that the job is complete by running the ` qstat -Q ` command
261
- periodically. The jobs should finish quickly, in a minute or two.
262
- ```
263
- [ellen@ip-0A000404 demo]$ qstat -Q
264
- Queue Max Tot Ena Str Que Run Hld Wat Trn Ext Type
265
- ---------------- ----- ----- --- --- ----- ----- ----- ----- ----- ----- ----
266
- workq 0 0 yes yes 0 0 0 0 0 0 Exec
267
- ```
259
+ periodically. The Queued column (` Que ` ) should be 0, indicating that no more
260
+ jobs are awaiting execution. For the above submission, jobs typically finish
261
+ in a minute or two.
262
+ ```
263
+ [ellen@ip-0A000404 demo]$ qstat -Q
264
+ Queue Max Tot Ena Str Que Run Hld Wat Trn Ext Type
265
+ ---------------- ----- ----- --- --- ----- ----- ----- ----- ----- ----- ----
266
+ workq 0 0 yes yes 0 0 0 0 0 0 Exec
267
+ ```
268
268
269
269
* With no more jobs in the queue, the execute nodes will start auto-stopping,
270
270
and your cluster will return to just having the master node.
271
271
272
+ ** Congratulations! You have completed the Lab 1 tutorial.**
273
+ * Continue to [ Lab 2 - Customizing an HPC cluster template] ( /Lab2/Tutorial.md ) ,
274
+ or
275
+ * Click on the "Terminate" button to terminate the cluster until you need it
276
+ again.
0 commit comments