|
19 | 19 | },
|
20 | 20 | {
|
21 | 21 | "cell_type": "code",
|
22 |
| - "execution_count": 2, |
| 22 | + "execution_count": null, |
23 | 23 | "metadata": {},
|
24 | 24 | "outputs": [],
|
25 | 25 | "source": [
|
|
30 | 30 | "\n",
|
31 | 31 | "sagemaker_session = sagemaker.Session()\n",
|
32 | 32 | "role = sagemaker.get_execution_role()\n",
|
33 |
| - "#bucket_name = 'tfworld2019-<your_bucket_name>'\n", |
34 |
| - "bucket_name = 'tfworld2019'" |
| 33 | + "\n", |
| 34 | + "bucket_name = '<your_bucket_name>'" |
35 | 35 | ]
|
36 | 36 | },
|
37 | 37 | {
|
|
41 | 41 | "**Step 2:** Specify hyperparameters, instance type and number of instances to distribute training to. The `hvd_processes_per_host` corrosponds to number of GPUs per instances. \n",
|
42 | 42 | "For example, if you choose:\n",
|
43 | 43 | "```\n",
|
44 |
| - "hvd_instance_type = 'ml.p3.8large'\n", |
| 44 | + "hvd_instance_type = 'ml.p3.8xlarge'\n", |
45 | 45 | "hvd_instance_count = 2\n",
|
46 | 46 | "hvd_processes_per_host = 4\n",
|
47 | 47 | "```\n",
|
|
138 | 138 | " job_name=job_name, wait=False)"
|
139 | 139 | ]
|
140 | 140 | },
|
| 141 | + { |
| 142 | + "cell_type": "markdown", |
| 143 | + "metadata": {}, |
| 144 | + "source": [ |
| 145 | + "**Note**: in the `estimator_hvd.fit()` function above, change`wait=True` if you want to see the training output in the Jupyter notebook.\n", |
| 146 | + "Advantage of setting `wait=False`, is that you can continue to run cells. \n", |
| 147 | + "Since we're unblocked due to `wait=False` we can now launch tensorboard in the notebook and monitor progress." |
| 148 | + ] |
| 149 | + }, |
141 | 150 | {
|
142 | 151 | "cell_type": "markdown",
|
143 | 152 | "metadata": {},
|
|
147 | 156 | },
|
148 | 157 | {
|
149 | 158 | "cell_type": "code",
|
150 |
| - "execution_count": 3, |
| 159 | + "execution_count": null, |
151 | 160 | "metadata": {},
|
152 |
| - "outputs": [ |
153 |
| - { |
154 |
| - "name": "stdout", |
155 |
| - "output_type": "stream", |
156 |
| - "text": [ |
157 |
| - "TensorBoard 1.14.0 at http://ip-172-16-89-111:6006/ (Press CTRL+C to quit)\n", |
158 |
| - "W1028 20:55:37.536751 140564607526656 core_plugin.py:172] Unable to get first event timestamp for run sm-dist-1x1-gpu-instances2019-10-24-10-08-55-297: No event timestamp could be found\n", |
159 |
| - "W1028 20:55:37.777247 140564607526656 core_plugin.py:172] Unable to get first event timestamp for run sm-dist-1x8-gpu-instances2019-10-24-07-43-40-297: No event timestamp could be found\n", |
160 |
| - "W1028 20:55:37.984411 140564607526656 core_plugin.py:172] Unable to get first event timestamp for run sm-dist-2x1-gpu-instances2019-10-28-10-24-06-301: No event timestamp could be found\n", |
161 |
| - "W1028 20:55:38.320934 140564607526656 core_plugin.py:172] Unable to get first event timestamp for run sm-dist-2x1-workers2019-10-28-20-28-23-301: No event timestamp could be found\n", |
162 |
| - "^C\n" |
163 |
| - ] |
164 |
| - } |
165 |
| - ], |
| 161 | + "outputs": [], |
166 | 162 | "source": [
|
167 | 163 | "!S3_REGION=us-west-2 tensorboard --logdir s3://{bucket_name}/tensorboard_logs/"
|
168 | 164 | ]
|
|
171 | 167 | "cell_type": "markdown",
|
172 | 168 | "metadata": {},
|
173 | 169 | "source": [
|
174 |
| - "Open a new browser tan and navigate to the folloiwng link to access TensorBoard:\n", |
175 |
| - "<br> https://tfworld2019.notebook.us-west-2.sagemaker.aws/proxy/6006/\n", |
176 |
| - "<br> Make sure that the name of the notebook instance is correct in the link above.\n", |
| 170 | + "Open a new browser and navigate to the folloiwng link to access TensorBoard:\n", |
| 171 | + "<br> https://***your_notebook_name***.notebook.us-west-2.sagemaker.aws/proxy/6006/\n", |
| 172 | + "<br> <br> \n", |
| 173 | + "**Note:** Make sure to replace `your_notebook_name` with the name of the notebook instance. You can find the name of your notebook instance on the browser URL.\n", |
177 | 174 | "<br> Don't forget the slash at the end of the URL 6006/"
|
178 | 175 | ]
|
| 176 | + }, |
| 177 | + { |
| 178 | + "cell_type": "code", |
| 179 | + "execution_count": null, |
| 180 | + "metadata": {}, |
| 181 | + "outputs": [], |
| 182 | + "source": [] |
179 | 183 | }
|
180 | 184 | ],
|
181 | 185 | "metadata": {
|
|
0 commit comments