Working on questions and assignments for Session 3

maximedelpit · Aug 19, 2023 · 3bf2f1a · 3bf2f1a
1 parent 2acf57b
commit 3bf2f1a
Show file tree

Hide file tree

Showing 2 changed files with 48 additions and 6 deletions.
diff --git a/penguins/penguins-cohort.ipynb b/penguins/penguins-cohort.ipynb
@@ -10707,7 +10707,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 35,
+   "execution_count": null,
    "id": "90fe82ae-6a2c-4461-bc83-bb52d8871e3b",
    "metadata": {
     "tags": []
@@ -10996,13 +10996,57 @@
     "\n",
     "<div style=\"margin: 30px 0 10px 0;\"><span style=\"font-size: 1.1em; padding:4px; background-color: #b8bf9f; color: #000;\"><strong>Question 3.1</strong></span></div>\n",
     "\n",
-    "TBD\n",
+    "When a Training Job finishes, SageMaker automatically uploads the model to S3. Which of the following statements about this process is correct?\n",
+    "\n",
+    "1. SageMaker automatically creates a `model.tar.gz` file with the entire content of the `/opt/ml/model` directory.\n",
+    "2. SageMaker automatically creates a `model.tar.gz` file with any files inside the `/opt/ml/model` directory as long as those files belong to the model we trained.\n",
+    "3. SageMaker automatically creates a `model.tar.gz` file with any new files created inside the container by the training script.\n",
+    "4. SageMaker automatically creates a `model.tar.gz` file with the content of the output folder configured in the training script.\n",
+    "\n",
+    "\n",
+    "<div style=\"margin: 30px 0 10px 0;\"><span style=\"font-size: 1.1em; padding:4px; background-color: #b8bf9f; color: #000;\"><strong>Question 3.2</strong></span></div>\n",
+    "\n",
+    "Our pipeline uses \"file mode\" to provide the Training Job access to the dataset. When using file mode, SageMaker downloads the training data from S3 to a local directory in the training container. Imagine we have a large dataset and don't want to wait for SageMaker to download every time we want to train a model. How can we solve this problem?\n",
+    "\n",
+    "1. We can train our model with a smaller portion of the dataset.\n",
+    "2. We can increase the number of instances and train many models in parallel.\n",
+    "3. We can use \"fast file mode\" to get file system access to S3.\n",
+    "4. We can use \"pipe mode\" to stream data directly from S3 into the training container.\n",
+    "\n",
+    "\n",
+    "<div style=\"margin: 30px 0 10px 0;\"><span style=\"font-size: 1.1em; padding:4px; background-color: #b8bf9f; color: #000;\"><strong>Question 3.3</strong></span></div>\n",
+    "\n",
+    "When tuning the model, we used an `IntegerParameter` to define the range we wanted to explore for the number of epochs. Which of the following classes are also supported to define the range of other types of parameters?\n",
+    "\n",
+    "1. `FloatParameter`\n",
+    "2. `ContinuousParameter`\n",
+    "3. `CategoricalParameter`\n",
+    "4. `DateTimeParameter`\n",
+    "\n",
+    "\n",
+    "<div style=\"margin: 30px 0 10px 0;\"><span style=\"font-size: 1.1em; padding:4px; background-color: #b8bf9f; color: #000;\"><strong>Question 3.4</strong></span></div>\n",
+    "\n",
+    "Which of the following statements are true about the usage of `max_jobs` and `max_parallel_jobs` when running a Hyperparameter Tuning Job?\n",
+    "\n",
+    "1. `max_jobs` represents the maximum total number of Training Jobs that the Hyperparameter Tuning Job will start. \n",
+    "2. `max_parallel_jobs` represents the maximum total number of Training Jobs that will run in parallel at any given time.\n",
+    "3. `max_parallel_jobs` can never be larger than `max_jobs`.\n",
+    "4. `max_jobs` can never be larger than `max_parallel_jobs`.\n",
+    "\n",
     "\n",
     "\n",
     "## Assignments\n",
     "\n",
-    "* <span style=\"padding:4px; background-color: #f2a68a; color: #000;\"><strong>Assignment 3.1</strong></span> We currently define the number of epochs to train the model as a constant that we pass to the Estimator using the list of hyperparameters. Replace this constant with a new Pipeline Parameter named `training_epochs`. You'll need to specify this new parameter when creating the Pipeline.\n",
-    "\n"
+    "\n",
+    "* <span style=\"padding:4px; background-color: #f2a68a; color: #000;\"><strong>Assignment 3.1</strong></span> The training script is using a hard-coded learning rate value to train the model. Modify the code to accept the learning rate as a parameter that we can control from outside the script.\n",
+    "\n",
+    "* <span style=\"padding:4px; background-color: #f2a68a; color: #000;\"><strong>Assignment 3.2</strong></span> We currently define the number of epochs to train the model as a constant that we pass to the Estimator using the list of hyperparameters. Replace this constant with a new Pipeline Parameter named `training_epochs`. You'll need to specify this new parameter when creating the Pipeline.\n",
+    "\n",
+    "* <span style=\"padding:4px; background-color: #f2a68a; color: #000;\"><strong>Assignment 3.3</strong></span> Our pipeline uses \"file mode\" to provide the Training Job access to the dataset. When using file mode, SageMaker downloads the training data from S3 to a local directory in the training container. For this assignment, modify the code to stream the data into the training container instead of copying it.\n",
+    "\n",
+    "* <span style=\"padding:4px; background-color: #f2a68a; color: #000;\"><strong>Assignment 3.4</strong></span> TBD.\n",
+    "\n",
+    "* <span style=\"padding:4px; background-color: #f2a68a; color: #000;\"><strong>Assignment 3.5</strong></span> Modify the pipeline you created for the \"Pipeline of Digits\" project and add a Training Step. This Training Step should receive the train and validation splits from the Preprocessing step.\n"
    ]
   },
   {
@@ -13627,7 +13671,6 @@
     "vcpuNum": 96
    }
   ],
-  "instance_type": "ml.t3.medium",
   "kernelspec": {
    "display_name": "Python 3 (TensorFlow 2.6 Python 3.8 CPU Optimized)",
    "language": "python",

diff --git a/penguins/penguins-setup.ipynb b/penguins/penguins-setup.ipynb
@@ -1394,7 +1394,6 @@
     "vcpuNum": 96
    }
   ],
-  "instance_type": "ml.t3.medium",
   "kernelspec": {
    "display_name": "Python 3 (TensorFlow 2.6 Python 3.8 CPU Optimized)",
    "language": "python",