From 47da76ed1f975e31af42da06beeb1277358dc923 Mon Sep 17 00:00:00 2001 From: "Santiago L. Valdarrama" Date: Mon, 23 Oct 2023 14:05:14 -0400 Subject: [PATCH] ... --- README.md | 10 + program/cohort.ipynb | 3688 +++++++++++++----------------------------- program/index.qmd | 31 +- program/setup.qmd | 2 +- 4 files changed, 1179 insertions(+), 2552 deletions(-) diff --git a/README.md b/README.md index 7519637..a1978ca 100644 --- a/README.md +++ b/README.md @@ -4,6 +4,16 @@ This repository contains the source code of the [Machine Learning School](https: If you find any problems with the code or have any ideas on improving it, please open an issue and share your recommendations. +## Running the code + +Before running the project, follow the [Setup instructions](https://program.ml.school/setup.html). After that, you can test the code by running the following command: + +``` +$ nbdev_test --path program/cohort.ipynb +``` + +This will run the notebook and make sure everything runs. If you have any problems, it's likely there's a configuration issue in your setup. + ## Resources * [Serving a TensorFlow model from a Flask application](penguins/serving/flask/README.md): A simple Flask application that serves a multi-class classification TensorFlow model to determine the species of a penguin. diff --git a/program/cohort.ipynb b/program/cohort.ipynb index 7ee252e..4ae1dba 100644 --- a/program/cohort.ipynb +++ b/program/cohort.ipynb @@ -7,7 +7,7 @@ "tags": [] }, "source": [ - "# Building Production Machine Learning Systems" + "# Building Production Machine Learning Systems\n" ] }, { @@ -16,14 +16,14 @@ "metadata": {}, "source": [ "This notebook creates a [SageMaker Pipeline](https://sagemaker.readthedocs.io/en/stable/amazon_sagemaker_model_building_pipeline.html) to build an end-to-end Machine Learning system to solve the problem of classifying penguin species. With a SageMaker Pipeline, you can create, automate, and manage end-to-end Machine Learning workflows at scale.\n", - " \n", + "\n", "You can find more information about Amazon SageMaker in the [Amazon SageMaker Developer Guide](https://docs.aws.amazon.com/sagemaker/latest/dg/whatis.html). The [AWS Machine Learning Blog](https://aws.amazon.com/blogs/machine-learning/) is an excellent source to stay up-to-date with SageMaker.\n", "\n", - "This example uses the [Penguins dataset](https://www.kaggle.com/parulpandey/palmer-archipelago-antarctica-penguin-data), the [boto3](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker.html) library, and the [SageMaker Python SDK](https://sagemaker.readthedocs.io/en/stable/). \n", + "This example uses the [Penguins dataset](https://www.kaggle.com/parulpandey/palmer-archipelago-antarctica-penguin-data), the [boto3](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker.html) library, and the [SageMaker Python SDK](https://sagemaker.readthedocs.io/en/stable/).\n", "\n", "Penguins\n", "\n", - "This notebook is part of the [Machine Learning School](https://www.ml.school) program." + "This notebook is part of the [Machine Learning School](https://www.ml.school) program.\n" ] }, { @@ -37,16 +37,26 @@ "Before running this notebook, follow the [setup instructions](https://program.ml.school/setup.html) for the program.\n", ":::\n", "\n", - "\n", - "Let's start by setting up the environment and preparing to run the notebook." + "Let's start by setting up the environment and preparing to run the notebook.\n" ] }, { "cell_type": "code", - "execution_count": 1, + "execution_count": 80, "id": "4b2265b0", "metadata": {}, - "outputs": [], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "The autoreload extension is already loaded. To reload it, use:\n", + " %reload_ext autoreload\n", + "The dotenv extension is already loaded. To reload it, use:\n", + " %reload_ext dotenv\n" + ] + } + ], "source": [ "#| hide\n", "\n", @@ -85,12 +95,12 @@ "id": "588d34c9", "metadata": {}, "source": [ - "We can run this notebook is [Local Mode](https://docs.aws.amazon.com/sagemaker/latest/dg/pipelines-local-mode.html) to test the pipeline in your local environment before using SageMaker. You can run the code in Local Mode by setting the `LOCAL_MODE` constant to `True`. " + "We can run this notebook is [Local Mode](https://docs.aws.amazon.com/sagemaker/latest/dg/pipelines-local-mode.html) to test the pipeline in your local environment before using SageMaker. You can run the code in Local Mode by setting the `LOCAL_MODE` constant to `True`.\n" ] }, { "cell_type": "code", - "execution_count": 2, + "execution_count": 81, "id": "32c4d764", "metadata": {}, "outputs": [], @@ -103,12 +113,12 @@ "id": "d6be4f8d", "metadata": {}, "source": [ - "Let's load the S3 bucket name and the AWS Role from the environment variables:" + "Let's load the S3 bucket name and the AWS Role from the environment variables:\n" ] }, { "cell_type": "code", - "execution_count": 3, + "execution_count": 82, "id": "3164a3af", "metadata": {}, "outputs": [], @@ -126,12 +136,12 @@ "id": "daa700f4", "metadata": {}, "source": [ - "If you are running the pipeline in Local Mode on an ARM64 machine, you will need to use a custom Docker image to train and evaluate the model. This is because SageMaker doesn't provide a TensorFlow image that supports Apple's M chips." + "If you are running the pipeline in Local Mode on an ARM64 machine, you will need to use a custom Docker image to train and evaluate the model. This is because SageMaker doesn't provide a TensorFlow image that supports Apple's M chips.\n" ] }, { "cell_type": "code", - "execution_count": 4, + "execution_count": 83, "id": "7bc40d28", "metadata": {}, "outputs": [], @@ -145,12 +155,12 @@ "id": "7d906ada", "metadata": {}, "source": [ - "Let's create a configuration dictionary with different settings depending on whether we are running the pipeline in Local Mode or not:" + "Let's create a configuration dictionary with different settings depending on whether we are running the pipeline in Local Mode or not:\n" ] }, { "cell_type": "code", - "execution_count": 5, + "execution_count": 84, "id": "3b3f17e5", "metadata": {}, "outputs": [], @@ -166,9 +176,7 @@ " \"instance_type\": \"local\",\n", " # We need to use a custom Docker image when we run the pipeline\n", " # in Local Model on an ARM64 machine.\n", - " \"image\": \"sagemaker-tensorflow-toolkit-local\"\n", - " if IS_APPLE_M_CHIP\n", - " else None,\n", + " \"image\": \"sagemaker-tensorflow-toolkit-local\" if IS_APPLE_M_CHIP else None,\n", " \"framework_version\": None if IS_APPLE_M_CHIP else \"2.11\",\n", " \"py_version\": None if IS_APPLE_M_CHIP else \"py39\",\n", " }\n", @@ -187,12 +195,12 @@ "id": "9089696b", "metadata": {}, "source": [ - "Let's now initialize a few variables that we'll need throughout the notebook:" + "Let's now initialize a few variables that we'll need throughout the notebook:\n" ] }, { "cell_type": "code", - "execution_count": 6, + "execution_count": 85, "id": "942a01b5", "metadata": {}, "outputs": [], @@ -212,7 +220,7 @@ "source": [ "## Session 1 - Production Machine Learning is Different\n", "\n", - "In this session we'll run Exploratory Data Analysis on the [Penguins dataset](https://www.kaggle.com/parulpandey/palmer-archipelago-antarctica-penguin-data) and we'll build a simple [SageMaker Pipeline](https://docs.aws.amazon.com/sagemaker/latest/dg/pipelines-sdk.html) with one step to split and transform the data. We'll use a [Scikit-Learn Pipeline](https://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html) for the transformations, and a [Processing Step](https://docs.aws.amazon.com/sagemaker/latest/dg/build-and-manage-steps.html#step-type-processing) with a [SKLearnProcessor](https://sagemaker.readthedocs.io/en/stable/frameworks/sklearn/sagemaker.sklearn.html#scikit-learn-processor) to execute a preprocessing script. Check the [SageMaker Pipelines Overview](https://docs.aws.amazon.com/sagemaker/latest/dg/pipelines-sdk.html) for an introduction to the fundamental components of a SageMaker Pipeline." + "In this session we'll run Exploratory Data Analysis on the [Penguins dataset](https://www.kaggle.com/parulpandey/palmer-archipelago-antarctica-penguin-data) and we'll build a simple [SageMaker Pipeline](https://docs.aws.amazon.com/sagemaker/latest/dg/pipelines-sdk.html) with one step to split and transform the data. We'll use a [Scikit-Learn Pipeline](https://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html) for the transformations, and a [Processing Step](https://docs.aws.amazon.com/sagemaker/latest/dg/build-and-manage-steps.html#step-type-processing) with a [SKLearnProcessor](https://sagemaker.readthedocs.io/en/stable/frameworks/sklearn/sagemaker.sklearn.html#scikit-learn-processor) to execute a preprocessing script. Check the [SageMaker Pipelines Overview](https://docs.aws.amazon.com/sagemaker/latest/dg/pipelines-sdk.html) for an introduction to the fundamental components of a SageMaker Pipeline.\n" ] }, { @@ -224,12 +232,12 @@ "\n", "Let's run Exploratory Data Analysis on the dataset. The goal of this section is to understand the data and the problem we are trying to solve.\n", "\n", - "Let's load the Penguins dataset:" + "Let's load the Penguins dataset:\n" ] }, { "cell_type": "code", - "execution_count": 11, + "execution_count": 86, "id": "f1cd2f0e-446d-48a9-a008-b4f1cc593bfc", "metadata": { "tags": [] @@ -336,7 +344,7 @@ "4 3450.0 FEMALE " ] }, - "execution_count": 11, + "execution_count": 86, "metadata": {}, "output_type": "execute_result" } @@ -368,12 +376,12 @@ "\n", "Culmen\n", "\n", - "Now, let's get the summary statistics for the features in our dataset." + "Now, let's get the summary statistics for the features in our dataset.\n" ] }, { "cell_type": "code", - "execution_count": 12, + "execution_count": 87, "id": "f2107c25-e730-4e22-a1b8-5bda53e61124", "metadata": { "tags": [] @@ -552,13 +560,13 @@ "max 6300.000000 NaN " ] }, - "execution_count": 12, + "execution_count": 87, "metadata": {}, "output_type": "execute_result" } ], "source": [ - "penguins.describe(include='all')" + "penguins.describe(include=\"all\")" ] }, { @@ -566,12 +574,12 @@ "id": "b2e19af7-9f0f-45fe-b7d3-f19721c02a2b", "metadata": {}, "source": [ - "Let's now display the distribution of values for the three categorical columns in our data:" + "Let's now display the distribution of values for the three categorical columns in our data:\n" ] }, { "cell_type": "code", - "execution_count": 13, + "execution_count": 88, "id": "1242122a-726e-4c37-a718-dd8e873d1612", "metadata": { "tags": [] @@ -602,9 +610,9 @@ } ], "source": [ - "species_distribution = penguins['species'].value_counts()\n", - "island_distribution = penguins['island'].value_counts()\n", - "sex_distribution = penguins['sex'].value_counts()\n", + "species_distribution = penguins[\"species\"].value_counts()\n", + "island_distribution = penguins[\"island\"].value_counts()\n", + "sex_distribution = penguins[\"sex\"].value_counts()\n", "\n", "print(species_distribution)\n", "print()\n", @@ -620,16 +628,16 @@ "source": [ "The distribution of the categories in our data are:\n", "\n", - "- `species`: There are 3 species of penguins in the dataset: Adelie (152), Gentoo (124), and Chinstrap (68).\n", - "- `island`: Penguins are from 3 islands: Biscoe (168), Dream (124), and Torgersen (52).\n", - "- `sex`: We have 168 male penguins, 165 female penguins, and 1 penguin with an ambiguous gender ('.').\n", + "- `species`: There are 3 species of penguins in the dataset: Adelie (152), Gentoo (124), and Chinstrap (68).\n", + "- `island`: Penguins are from 3 islands: Biscoe (168), Dream (124), and Torgersen (52).\n", + "- `sex`: We have 168 male penguins, 165 female penguins, and 1 penguin with an ambiguous gender ('.').\n", "\n", - "Let's replace the ambiguous value in the `sex` column with a null value:" + "Let's replace the ambiguous value in the `sex` column with a null value:\n" ] }, { "cell_type": "code", - "execution_count": 14, + "execution_count": 89, "id": "cf1cf582-8831-4f83-bb17-2175afb193e8", "metadata": { "tags": [] @@ -644,7 +652,7 @@ "Name: count, dtype: int64" ] }, - "execution_count": 14, + "execution_count": 89, "metadata": {}, "output_type": "execute_result" } @@ -659,12 +667,12 @@ "id": "6e8425ce-ce4e-43e6-9ed8-0398b780cc66", "metadata": {}, "source": [ - "Next, let's check for any missing values in the dataset." + "Next, let's check for any missing values in the dataset.\n" ] }, { "cell_type": "code", - "execution_count": 15, + "execution_count": 90, "id": "cc42cb08-275c-4b05-9d2b-77052da2f336", "metadata": { "tags": [] @@ -683,7 +691,7 @@ "dtype: int64" ] }, - "execution_count": 15, + "execution_count": 90, "metadata": {}, "output_type": "execute_result" } @@ -697,12 +705,12 @@ "id": "1b65207c-3e66-453a-87a1-751636c979ee", "metadata": {}, "source": [ - "Let's get rid of the missing values. For now, we are going to replace the missing values with the most frequent value in the column. Later, we'll use a different strategy to replace missing numeric values." + "Let's get rid of the missing values. For now, we are going to replace the missing values with the most frequent value in the column. Later, we'll use a different strategy to replace missing numeric values.\n" ] }, { "cell_type": "code", - "execution_count": 16, + "execution_count": 91, "id": "3c57d55d-afd6-467a-a7a8-ff04132770ed", "metadata": { "tags": [] @@ -721,7 +729,7 @@ "dtype: int64" ] }, - "execution_count": 16, + "execution_count": 91, "metadata": {}, "output_type": "execute_result" } @@ -730,7 +738,7 @@ "from sklearn.impute import SimpleImputer\n", "\n", "imputer = SimpleImputer(strategy=\"most_frequent\")\n", - "penguins.iloc[:,:] = imputer.fit_transform(penguins)\n", + "penguins.iloc[:, :] = imputer.fit_transform(penguins)\n", "penguins.isnull().sum()" ] }, @@ -739,12 +747,12 @@ "id": "5758214f-a4ab-4980-8892-91ec8d218ef3", "metadata": {}, "source": [ - "Let's visualize the distribution of categorical features." + "Let's visualize the distribution of categorical features.\n" ] }, { "cell_type": "code", - "execution_count": 59, + "execution_count": 92, "id": "2852c740", "metadata": {}, "outputs": [ @@ -785,12 +793,12 @@ "id": "b04c8fae-35b4-4d8e-8fff-decee050af3a", "metadata": {}, "source": [ - "Let's visualize the distribution of numerical columns." + "Let's visualize the distribution of numerical columns.\n" ] }, { "cell_type": "code", - "execution_count": 38, + "execution_count": 93, "id": "707cc972", "metadata": {}, "outputs": [ @@ -833,12 +841,12 @@ "id": "ef241df0-3acd-4401-a2c6-b70723d7595b", "metadata": {}, "source": [ - "Let's display the covariance matrix of the dataset. The \"covariance\" measures how changes in one variable are associated with changes in a second variable. In other words, the covariance measures the degree to which two variables are linearly associated." + "Let's display the covariance matrix of the dataset. The \"covariance\" measures how changes in one variable are associated with changes in a second variable. In other words, the covariance measures the degree to which two variables are linearly associated.\n" ] }, { "cell_type": "code", - "execution_count": 20, + "execution_count": 94, "id": "3daf3ba1-d218-4ad4-b862-af679b91273f", "metadata": { "tags": [] @@ -918,7 +926,7 @@ "body_mass_g 640316.716388 " ] }, - "execution_count": 20, + "execution_count": 94, "metadata": {}, "output_type": "execute_result" } @@ -938,12 +946,12 @@ "2. The more a penguin weights, the shallower its culmen tends to be.\n", "3. There's a small variance between the culmen depth of penguins.\n", "\n", - "Let's now display the correlation matrix. \"Correlation\" measures both the strength and direction of the linear relationship between two variables." + "Let's now display the correlation matrix. \"Correlation\" measures both the strength and direction of the linear relationship between two variables.\n" ] }, { "cell_type": "code", - "execution_count": 21, + "execution_count": 95, "id": "1d793e09-2cb9-47ff-a0e6-199a0f4fc1b3", "metadata": { "tags": [] @@ -1023,7 +1031,7 @@ "body_mass_g 1.000000 " ] }, - "execution_count": 21, + "execution_count": 95, "metadata": {}, "output_type": "execute_result" } @@ -1043,12 +1051,12 @@ "2. Penguins with a shallower culmen tend to have larger flippers.\n", "3. The length and depth of the culmen have a slight negative correlation.\n", "\n", - "Let's display the distribution of species by island." + "Let's display the distribution of species by island.\n" ] }, { "cell_type": "code", - "execution_count": 58, + "execution_count": 96, "id": "1258c99d", "metadata": {}, "outputs": [ @@ -1083,12 +1091,12 @@ "id": "d74ae740-3590-4dce-ac5a-6205975c83da", "metadata": {}, "source": [ - "Let's display the distribution of species by sex." + "Let's display the distribution of species by sex.\n" ] }, { "cell_type": "code", - "execution_count": 54, + "execution_count": 97, "id": "45b0a87f-028d-477f-9b65-199728c0b7ee", "metadata": { "tags": [] @@ -1129,7 +1137,7 @@ "source": [ "### Step 2 - Creating the Preprocessing Script\n", "\n", - "The first step we need in the pipeline is a [Processing Step](https://docs.aws.amazon.com/sagemaker/latest/dg/build-and-manage-steps.html#step-type-processing) to run a script that will split and transform the data. This Processing Step will create a SageMaker Processing Job in the background, run the script, and upload the output to S3. You can use Processing Jobs to perform data preprocessing, post-processing, feature engineering, data validation, and model evaluation. Check the [ProcessingStep](https://sagemaker.readthedocs.io/en/stable/workflows/pipelines/sagemaker.workflow.pipelines.html#sagemaker.workflow.steps.ProcessingStep) SageMaker's SDK documentation for more information." + "The first step we need in the pipeline is a [Processing Step](https://docs.aws.amazon.com/sagemaker/latest/dg/build-and-manage-steps.html#step-type-processing) to run a script that will split and transform the data. This Processing Step will create a SageMaker Processing Job in the background, run the script, and upload the output to S3. You can use Processing Jobs to perform data preprocessing, post-processing, feature engineering, data validation, and model evaluation. Check the [ProcessingStep](https://sagemaker.readthedocs.io/en/stable/workflows/pipelines/sagemaker.workflow.pipelines.html#sagemaker.workflow.steps.ProcessingStep) SageMaker's SDK documentation for more information.\n" ] }, { @@ -1137,12 +1145,12 @@ "id": "7d656af1", "metadata": {}, "source": [ - "The first step is to create the script that will split and transform the input data." + "The first step is to create the script that will split and transform the input data.\n" ] }, { "cell_type": "code", - "execution_count": 24, + "execution_count": 98, "id": "fb6ba7c0-1bd6-4fe5-8b7f-f6cbdfd3846c", "metadata": { "tags": [] @@ -1158,7 +1166,6 @@ ], "source": [ "%%writefile {CODE_FOLDER}/preprocessor.py\n", - "\n", "#| label: preprocessing-script\n", "#| echo: true\n", "#| output: false\n", @@ -1324,12 +1331,12 @@ "id": "39301f9f", "metadata": {}, "source": [ - "Let's test the script to ensure everything is working as expected:" + "Let's test the script to ensure everything is working as expected:\n" ] }, { "cell_type": "code", - "execution_count": 25, + "execution_count": 99, "id": "d1f122a4-acff-4687-91b9-bfef13567d88", "metadata": { "tags": [] @@ -1340,7 +1347,7 @@ "output_type": "stream", "text": [ "\u001b[32m.\u001b[0m\u001b[32m.\u001b[0m\u001b[32m.\u001b[0m\u001b[32m.\u001b[0m\u001b[32m.\u001b[0m\n", - "\u001b[32m\u001b[32m\u001b[1m5 passed\u001b[0m\u001b[32m in 0.10s\u001b[0m\u001b[0m\n" + "\u001b[32m\u001b[32m\u001b[1m5 passed\u001b[0m\u001b[32m in 0.09s\u001b[0m\u001b[0m\n" ] } ], @@ -1433,7 +1440,7 @@ "source": [ "### Step 3 - Setting up the Processing Step\n", "\n", - "Let's now define the [ProcessingStep](https://sagemaker.readthedocs.io/en/stable/workflows/pipelines/sagemaker.workflow.pipelines.html#sagemaker.workflow.steps.ProcessingStep) that we'll use in the pipeline to run the script that will split and transform the data." + "Let's now define the [ProcessingStep](https://sagemaker.readthedocs.io/en/stable/workflows/pipelines/sagemaker.workflow.pipelines.html#sagemaker.workflow.steps.ProcessingStep) that we'll use in the pipeline to run the script that will split and transform the data.\n" ] }, { @@ -1441,22 +1448,19 @@ "id": "ff061663", "metadata": {}, "source": [ - "Several SageMaker Pipeline steps support caching. When a step runs, and dependending on the configured caching policy, SageMaker will try to reuse the result of a previous successful run of the same step. You can find more information about this topic in [Caching Pipeline Steps](https://docs.aws.amazon.com/sagemaker/latest/dg/pipelines-caching.html). Let's define a caching policy that we'll reuse on every step:" + "Several SageMaker Pipeline steps support caching. When a step runs, and dependending on the configured caching policy, SageMaker will try to reuse the result of a previous successful run of the same step. You can find more information about this topic in [Caching Pipeline Steps](https://docs.aws.amazon.com/sagemaker/latest/dg/pipelines-caching.html). Let's define a caching policy that we'll reuse on every step:\n" ] }, { "cell_type": "code", - "execution_count": 26, + "execution_count": 100, "id": "d88e9ccf", "metadata": {}, "outputs": [], "source": [ "from sagemaker.workflow.steps import CacheConfig\n", "\n", - "cache_config = CacheConfig(\n", - " enable_caching=True, \n", - " expire_after=\"15d\"\n", - ")" + "cache_config = CacheConfig(enable_caching=True, expire_after=\"15d\")" ] }, { @@ -1464,12 +1468,12 @@ "id": "f3b1d96a", "metadata": {}, "source": [ - "We can parameterize a SageMaker Pipeline to make it more flexible. In this case, we'll use a paramater to pass the location of the dataset we want to process. We can execute the pipeline with different datasets by changing the value of this parameter. To read more about these parameters, check [Pipeline Parameters](https://docs.aws.amazon.com/sagemaker/latest/dg/build-and-manage-parameters.html). " + "We can parameterize a SageMaker Pipeline to make it more flexible. In this case, we'll use a paramater to pass the location of the dataset we want to process. We can execute the pipeline with different datasets by changing the value of this parameter. To read more about these parameters, check [Pipeline Parameters](https://docs.aws.amazon.com/sagemaker/latest/dg/build-and-manage-parameters.html).\n" ] }, { "cell_type": "code", - "execution_count": 27, + "execution_count": 101, "id": "331fe373", "metadata": {}, "outputs": [], @@ -1487,30 +1491,36 @@ "id": "cfb9a589", "metadata": {}, "source": [ - "A processor gives the Processing Step information about the hardware and software that SageMaker should use to launch the Processing Job. To run the script we created, we need access to Scikit-Learn, so we can use the [SKLearnProcessor](https://sagemaker.readthedocs.io/en/stable/frameworks/sklearn/sagemaker.sklearn.html#scikit-learn-processor) processor that comes out-of-the-box with the SageMaker's Python SDK. The [Data Processing with Framework Processors](https://docs.aws.amazon.com/sagemaker/latest/dg/processing-job-frameworks.html) page discusses other built-in processors you can use. The [Docker Registry Paths and Example Code](https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-algo-docker-registry-paths.html) page contains information about the available framework versions for each region." + "A processor gives the Processing Step information about the hardware and software that SageMaker should use to launch the Processing Job. To run the script we created, we need access to Scikit-Learn, so we can use the [SKLearnProcessor](https://sagemaker.readthedocs.io/en/stable/frameworks/sklearn/sagemaker.sklearn.html#scikit-learn-processor) processor that comes out-of-the-box with the SageMaker's Python SDK. The [Data Processing with Framework Processors](https://docs.aws.amazon.com/sagemaker/latest/dg/processing-job-frameworks.html) page discusses other built-in processors you can use. The [Docker Registry Paths and Example Code](https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-algo-docker-registry-paths.html) page contains information about the available framework versions for each region.\n" ] }, { "cell_type": "code", - "execution_count": 28, + "execution_count": 102, "id": "3aa4471a", "metadata": {}, - "outputs": [], + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO:sagemaker.image_uris:Defaulting to only available Python version: py3\n" + ] + } + ], "source": [ "from sagemaker.sklearn.processing import SKLearnProcessor\n", "\n", "processor = SKLearnProcessor(\n", " base_job_name=\"split-and-transform-data\",\n", " framework_version=\"1.2-1\",\n", - " \n", " # By default, a new account doesn't have access to `ml.m5.xlarge` instances.\n", " # If you haven't requested a quota increase yet, you can use an\n", " # `ml.t3.medium` instance type instead. This will work out of the box, but\n", " # the Processing Job will take significantly longer than it should have.\n", - " # To get access to `ml.m5.xlarge` instances, you can request a quota \n", + " # To get access to `ml.m5.xlarge` instances, you can request a quota\n", " # increase under the Service Quotas section in your AWS account.\n", " instance_type=config[\"instance_type\"],\n", - "\n", " instance_count=1,\n", " role=role,\n", " sagemaker_session=config[\"session\"],\n", @@ -1522,12 +1532,12 @@ "id": "6cf2cc58", "metadata": {}, "source": [ - "Let's now define the Processing Step that we'll use in the pipeline. This step requires a list of inputs that we need on the preprocessing script. In this case, the input is the dataset we stored in S3. We also have a few outputs that we want SageMaker to capture when the Processing Job finishes." + "Let's now define the Processing Step that we'll use in the pipeline. This step requires a list of inputs that we need on the preprocessing script. In this case, the input is the dataset we stored in S3. We also have a few outputs that we want SageMaker to capture when the Processing Job finishes.\n" ] }, { "cell_type": "code", - "execution_count": 29, + "execution_count": 103, "id": "cdbd9303", "metadata": { "tags": [] @@ -1537,10 +1547,8 @@ "name": "stderr", "output_type": "stream", "text": [ - "/Users/svpino/dev/ml.school/.venv/lib/python3.9/site-packages/sagemaker/workflow/pipeline_context.py:297: UserWarning:\n", - "\n", - "Running within a PipelineSession, there will be No Wait, No Logs, and No Job being started.\n", - "\n" + "/Users/svpino/dev/ml.school/.venv/lib/python3.9/site-packages/sagemaker/workflow/pipeline_context.py:297: UserWarning: Running within a PipelineSession, there will be No Wait, No Logs, and No Job being started.\n", + " warnings.warn(\n" ] } ], @@ -1602,12 +1610,12 @@ "source": [ "### Step 4 - Creating the Pipeline\n", "\n", - "We can now create the SageMaker Pipeline and submit its definition to the SageMaker Pipelines service to create the pipeline if it doesn't exist or update it if it does." + "We can now create the SageMaker Pipeline and submit its definition to the SageMaker Pipelines service to create the pipeline if it doesn't exist or update it if it does.\n" ] }, { "cell_type": "code", - "execution_count": 30, + "execution_count": 104, "id": "e140642a", "metadata": { "tags": [] @@ -1617,23 +1625,23 @@ "data": { "text/plain": [ "{'PipelineArn': 'arn:aws:sagemaker:us-east-1:325223348818:pipeline/session1-pipeline',\n", - " 'ResponseMetadata': {'RequestId': '58424cbf-32de-43dc-a521-714deaa45233',\n", + " 'ResponseMetadata': {'RequestId': '885fb534-099d-4af5-b3f4-d511a68373c2',\n", " 'HTTPStatusCode': 200,\n", - " 'HTTPHeaders': {'x-amzn-requestid': '58424cbf-32de-43dc-a521-714deaa45233',\n", + " 'HTTPHeaders': {'x-amzn-requestid': '885fb534-099d-4af5-b3f4-d511a68373c2',\n", " 'content-type': 'application/x-amz-json-1.1',\n", " 'content-length': '85',\n", - " 'date': 'Sat, 21 Oct 2023 21:14:13 GMT'},\n", + " 'date': 'Mon, 23 Oct 2023 15:47:13 GMT'},\n", " 'RetryAttempts': 0}}" ] }, - "execution_count": 30, + "execution_count": 104, "metadata": {}, "output_type": "execute_result" } ], "source": [ - "#| code: true\n", - "#| output: false\n", + "# | code: true\n", + "# | output: false\n", "\n", "from sagemaker.workflow.pipeline import Pipeline\n", "from sagemaker.workflow.pipeline_definition_config import PipelineDefinitionConfig\n", @@ -1642,9 +1650,7 @@ "\n", "session1_pipeline = Pipeline(\n", " name=\"session1-pipeline\",\n", - " parameters=[\n", - " dataset_location\n", - " ],\n", + " parameters=[dataset_location],\n", " steps=[\n", " split_and_transform_data_step,\n", " ],\n", @@ -1660,7 +1666,7 @@ "id": "ff8f99c1", "metadata": {}, "source": [ - "We can now start the pipeline:" + "We can now start the pipeline:\n" ] }, { @@ -1672,12 +1678,12 @@ "\n", "
Note: \n", " The %%script cell magic is a convenient way to prevent the notebook from executing a specific cell. If you want to run the cell, comment out the line containing the %%script cell magic.\n", - "
" + "\n" ] }, { "cell_type": "code", - "execution_count": 29, + "execution_count": 105, "id": "59d1e634", "metadata": {}, "outputs": [], @@ -1698,15 +1704,15 @@ "source": [ "### Assignments\n", "\n", - "* Assignment 1.1 The SageMaker Pipeline we built supports running a few steps in Local Mode. The goal of this assignment is to run the pipeline on your local environment using Local Mode.\n", + "- Assignment 1.1 The SageMaker Pipeline we built supports running a few steps in Local Mode. The goal of this assignment is to run the pipeline on your local environment using Local Mode.\n", "\n", - "* Assignment 1.2 For this assignment, we want to run the end-to-end pipeline in SageMaker Studio. Ensure you turn off Local Mode before doing so.\n", + "- Assignment 1.2 For this assignment, we want to run the end-to-end pipeline in SageMaker Studio. Ensure you turn off Local Mode before doing so.\n", "\n", - "* Assignment 1.3 The pipeline uses Random Sampling to split the dataset. Modify the code to use Stratified Sampling instead.\n", + "- Assignment 1.3 The pipeline uses Random Sampling to split the dataset. Modify the code to use Stratified Sampling instead.\n", "\n", - "* Assignment 1.4 For this assignment, we want to run a distributed Processing Job across multiple instances to capitalize the `island` column of the dataset. Your dataset will consist of 10 different files stored in S3. Set up a Processing Job using two instances. When specifying the input to the Processing Job, you must set the `ProcessingInput.s3_data_distribution_type` attribute to `ShardedByS3Key`. By doing this, SageMaker will run a cluster with two instances simultaneously, each with access to half the files.\n", + "- Assignment 1.4 For this assignment, we want to run a distributed Processing Job across multiple instances to capitalize the `island` column of the dataset. Your dataset will consist of 10 different files stored in S3. Set up a Processing Job using two instances. When specifying the input to the Processing Job, you must set the `ProcessingInput.s3_data_distribution_type` attribute to `ShardedByS3Key`. By doing this, SageMaker will run a cluster with two instances simultaneously, each with access to half the files.\n", "\n", - "* Assignment 1.5 Pipeline steps can encounter exceptions. In some cases, retrying can resolve these issues. For this assignment, configure the Processing Step so it automatically retries the step a maximum of 5 times if it encounters an `InternalServerError`. Check the [Retry Policy for Pipeline Steps](https://docs.aws.amazon.com/sagemaker/latest/dg/pipelines-retry-policy.html) documentation for more information." + "- Assignment 1.5 Pipeline steps can encounter exceptions. In some cases, retrying can resolve these issues. For this assignment, configure the Processing Step so it automatically retries the step a maximum of 5 times if it encounters an `InternalServerError`. Check the [Retry Policy for Pipeline Steps](https://docs.aws.amazon.com/sagemaker/latest/dg/pipelines-retry-policy.html) documentation for more information.\n" ] }, { @@ -1716,9 +1722,9 @@ "source": [ "## Session 2 - Building Models And The Training Pipeline\n", "\n", - "This session extends the [SageMaker Pipeline](https://docs.aws.amazon.com/sagemaker/latest/dg/pipelines-sdk.html) we built in the previous session with a step to train a model. We'll explore the [Training Step](https://docs.aws.amazon.com/sagemaker/latest/dg/build-and-manage-steps.html#step-type-training) and the [Tuning Step](https://docs.aws.amazon.com/sagemaker/latest/dg/build-and-manage-steps.html#step-type-tuning). \n", + "This session extends the [SageMaker Pipeline](https://docs.aws.amazon.com/sagemaker/latest/dg/pipelines-sdk.html) we built in the previous session with a step to train a model. We'll explore the [Training Step](https://docs.aws.amazon.com/sagemaker/latest/dg/build-and-manage-steps.html#step-type-training) and the [Tuning Step](https://docs.aws.amazon.com/sagemaker/latest/dg/build-and-manage-steps.html#step-type-tuning).\n", "\n", - "We'll introduce [Amazon SageMaker Experiments](https://docs.aws.amazon.com/sagemaker/latest/dg/experiments.html) and use them during training. For more information about this topic, check the [SageMaker Experiments' SDK documentation](https://sagemaker.readthedocs.io/en/v2.174.0/experiments/sagemaker.experiments.html)." + "We'll introduce [Amazon SageMaker Experiments](https://docs.aws.amazon.com/sagemaker/latest/dg/experiments.html) and use them during training. For more information about this topic, check the [SageMaker Experiments' SDK documentation](https://sagemaker.readthedocs.io/en/v2.174.0/experiments/sagemaker.experiments.html).\n" ] }, { @@ -1728,34 +1734,33 @@ "source": [ "### Step 1 - Creating the Training Script\n", "\n", - "This following script is responsible for training a neural network using the train data, validating the model, and saving it so we can later use it:" + "This following script is responsible for training a neural network using the train data, validating the model, and saving it so we can later use it:\n" ] }, { "cell_type": "code", - "execution_count": 30, + "execution_count": 106, "id": "d92b121d-dcb9-43e8-9ee3-3ececb583e7e", "metadata": { "tags": [] }, "outputs": [ { - "name": "stderr", + "name": "stdout", "output_type": "stream", "text": [ - "UsageError: Line magic function `%%writefile` not found.\n" + "Overwriting code/train.py\n" ] } ], "source": [ + "%%writefile {CODE_FOLDER}/train.py\n", "#| label: training-script\n", "#| echo: true\n", "#| output: false\n", "#| filename: train.py\n", "#| code-line-numbers: true\n", "\n", - "%%writefile {CODE_FOLDER}/train.py\n", - "\n", "import os\n", "import argparse\n", "\n", @@ -1838,12 +1843,12 @@ "id": "50f0a4fa-ce70-4882-b9f5-8253df03d890", "metadata": {}, "source": [ - "Let's test the script to ensure everything is working as expected:" + "Let's test the script to ensure everything is working as expected:\n" ] }, { "cell_type": "code", - "execution_count": 31, + "execution_count": 107, "id": "14ea27ce-c453-4cb0-b309-dbecd732957e", "metadata": { "tags": [] @@ -1860,24 +1865,37 @@ "name": "stdout", "output_type": "stream", "text": [ - "8/8 - 0s - loss: 0.9632 - accuracy: 0.4708 - val_loss: 0.8554 - val_accuracy: 0.5962 - 253ms/epoch - 32ms/step\n", - "2/2 [==============================] - 0s 2ms/step\n", - "Validation accuracy: 0.5961538461538461\n" + "8/8 - 0s - loss: 1.2884 - accuracy: 0.5230 - val_loss: 1.1704 - val_accuracy: 0.5490 - 218ms/epoch - 27ms/step\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "WARNING:tensorflow:6 out of the last 11 calls to .predict_function at 0x31af753a0> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating @tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your @tf.function outside of the loop. For (2), @tf.function has reduce_retracing=True option that can avoid unnecessary retracing. For (3), please refer to https://www.tensorflow.org/guide/function#controlling_retracing and https://www.tensorflow.org/api_docs/python/tf/function for more details.\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "2/2 [==============================] - 0s 1ms/step\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ - "INFO:tensorflow:Assets written to: /var/folders/4c/v1q3hy1x4mb5w0wpc72zl3_w0000gp/T/tmpzksbdnp9/model/001/assets\n" + "INFO:tensorflow:Assets written to: /var/folders/4c/v1q3hy1x4mb5w0wpc72zl3_w0000gp/T/tmp5ckp425a/model/001/assets\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ + "Validation accuracy: 0.5490196078431373\n", "\u001b[32m.\u001b[0m\n", - "\u001b[32m\u001b[32m\u001b[1m1 passed\u001b[0m\u001b[32m in 0.80s\u001b[0m\u001b[0m\n" + "\u001b[32m\u001b[32m\u001b[1m1 passed\u001b[0m\u001b[32m in 0.49s\u001b[0m\u001b[0m\n" ] } ], @@ -1935,18 +1953,18 @@ "source": [ "### Step 2 - Setting up the Training Step\n", "\n", - "We can now create a [Training Step](https://docs.aws.amazon.com/sagemaker/latest/dg/build-and-manage-steps.html#step-type-training) that we can add to the pipeline. This Training Step will create a SageMaker Training Job in the background, run the training script, and upload the output to S3. Check the [TrainingStep](https://sagemaker.readthedocs.io/en/stable/workflows/pipelines/sagemaker.workflow.pipelines.html#sagemaker.workflow.steps.TrainingStep) SageMaker's SDK documentation for more information. \n", + "We can now create a [Training Step](https://docs.aws.amazon.com/sagemaker/latest/dg/build-and-manage-steps.html#step-type-training) that we can add to the pipeline. This Training Step will create a SageMaker Training Job in the background, run the training script, and upload the output to S3. Check the [TrainingStep](https://sagemaker.readthedocs.io/en/stable/workflows/pipelines/sagemaker.workflow.pipelines.html#sagemaker.workflow.steps.TrainingStep) SageMaker's SDK documentation for more information.\n", "\n", "SageMaker uses the concept of an [Estimator](https://sagemaker.readthedocs.io/en/stable/api/training/estimators.html) to handle end-to-end training and deployment tasks. For this example, we will use the built-in [TensorFlow Estimator](https://sagemaker.readthedocs.io/en/stable/frameworks/tensorflow/sagemaker.tensorflow.html#tensorflow-estimator) to run the training script we wrote before. The [Docker Registry Paths and Example Code](https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-algo-docker-registry-paths.html) page contains information about the available framework versions for each region. Here, you can also check the available SageMaker [Deep Learning Container images](https://github.com/aws/deep-learning-containers/blob/master/available_images.md).\n", "\n", "Notice the list of hyperparameters defined below. SageMaker will pass these hyperparameters as arguments to the entry point of the training script.\n", "\n", - "We are going to use [SageMaker Experiments](https://sagemaker.readthedocs.io/en/v2.174.0/experiments/sagemaker.experiments.html) to log information from the Training Job. For more information, check [Manage Machine Learning with Amazon SageMaker Experiments](https://docs.aws.amazon.com/sagemaker/latest/dg/experiments.html). The list of metric definitions will tell SageMaker which metrics to track and how to parse them from the Training Job logs." + "We are going to use [SageMaker Experiments](https://sagemaker.readthedocs.io/en/v2.174.0/experiments/sagemaker.experiments.html) to log information from the Training Job. For more information, check [Manage Machine Learning with Amazon SageMaker Experiments](https://docs.aws.amazon.com/sagemaker/latest/dg/experiments.html). The list of metric definitions will tell SageMaker which metrics to track and how to parse them from the Training Job logs.\n" ] }, { "cell_type": "code", - "execution_count": 32, + "execution_count": 108, "id": "90fe82ae-6a2c-4461-bc83-bb52d8871e3b", "metadata": { "tags": [] @@ -1990,19 +2008,19 @@ "id": "545d2b43-3bb5-4fe9-b3e4-cb8eb55c8a21", "metadata": {}, "source": [ - "We can now create a [Training Step](https://docs.aws.amazon.com/sagemaker/latest/dg/build-and-manage-steps.html#step-type-training). This Training Step will create a SageMaker Training Job in the background, run the training script, and upload the output to S3. Check the [TrainingStep](https://sagemaker.readthedocs.io/en/stable/workflows/pipelines/sagemaker.workflow.pipelines.html#sagemaker.workflow.steps.TrainingStep) SageMaker's SDK documentation for more information. \n", + "We can now create a [Training Step](https://docs.aws.amazon.com/sagemaker/latest/dg/build-and-manage-steps.html#step-type-training). This Training Step will create a SageMaker Training Job in the background, run the training script, and upload the output to S3. Check the [TrainingStep](https://sagemaker.readthedocs.io/en/stable/workflows/pipelines/sagemaker.workflow.pipelines.html#sagemaker.workflow.steps.TrainingStep) SageMaker's SDK documentation for more information.\n", "\n", "This step will receive the train and validation split from the previous step as inputs.\n", "\n", "Here, we are using two input channels, `train` and `validation`. SageMaker will automatically create an environment variable corresponding to each of these channels following the format `SM_CHANNEL_[channel_name]`:\n", "\n", - "* `SM_CHANNEL_TRAIN`: This environment variable will contain the path to the data in the `train` channel\n", - "* `SM_CHANNEL_VALIDATION`: This environment variable will contain the path to the data in the `validation` channel" + "- `SM_CHANNEL_TRAIN`: This environment variable will contain the path to the data in the `train` channel\n", + "- `SM_CHANNEL_VALIDATION`: This environment variable will contain the path to the data in the `validation` channel\n" ] }, { "cell_type": "code", - "execution_count": 33, + "execution_count": 109, "id": "99e4850c-83d6-4f4e-a813-d5a3f4bb7486", "metadata": { "tags": [] @@ -2012,16 +2030,14 @@ "name": "stderr", "output_type": "stream", "text": [ - "/Users/svpino/dev/ml.school/.venv/lib/python3.9/site-packages/sagemaker/workflow/pipeline_context.py:297: UserWarning:\n", - "\n", - "Running within a PipelineSession, there will be No Wait, No Logs, and No Job being started.\n", - "\n" + "/Users/svpino/dev/ml.school/.venv/lib/python3.9/site-packages/sagemaker/workflow/pipeline_context.py:297: UserWarning: Running within a PipelineSession, there will be No Wait, No Logs, and No Job being started.\n", + " warnings.warn(\n" ] } ], "source": [ - "#| code: true\n", - "#| output: false\n", + "# | code: true\n", + "# | output: false\n", "\n", "from sagemaker.workflow.steps import TrainingStep\n", "from sagemaker.inputs import TrainingInput\n", @@ -2035,17 +2051,17 @@ " s3_data=split_and_transform_data_step.properties.ProcessingOutputConfig.Outputs[\n", " \"train\"\n", " ].S3Output.S3Uri,\n", - " content_type=\"text/csv\"\n", + " content_type=\"text/csv\",\n", " ),\n", " \"validation\": TrainingInput(\n", " s3_data=split_and_transform_data_step.properties.ProcessingOutputConfig.Outputs[\n", " \"validation\"\n", " ].S3Output.S3Uri,\n", - " content_type=\"text/csv\"\n", - " )\n", + " content_type=\"text/csv\",\n", + " ),\n", " }\n", " ),\n", - " cache_config=cache_config\n", + " cache_config=cache_config,\n", ")" ] }, @@ -2056,7 +2072,7 @@ "source": [ "### Step 3 - Setting up a Tuning Step\n", "\n", - "Let's now create a [Tuning Step](https://docs.aws.amazon.com/sagemaker/latest/dg/build-and-manage-steps.html#step-type-tuning). This Tuning Step will create a SageMaker Hyperparameter Tuning Job in the background and use the training script to train different model variants and choose the best one. Check the [TuningStep](https://sagemaker.readthedocs.io/en/stable/workflows/pipelines/sagemaker.workflow.pipelines.html#sagemaker.workflow.steps.TuningStep) SageMaker's SDK documentation for more information." + "Let's now create a [Tuning Step](https://docs.aws.amazon.com/sagemaker/latest/dg/build-and-manage-steps.html#step-type-tuning). This Tuning Step will create a SageMaker Hyperparameter Tuning Job in the background and use the training script to train different model variants and choose the best one. Check the [TuningStep](https://sagemaker.readthedocs.io/en/stable/workflows/pipelines/sagemaker.workflow.pipelines.html#sagemaker.workflow.steps.TuningStep) SageMaker's SDK documentation for more information.\n" ] }, { @@ -2064,12 +2080,12 @@ "id": "90eb5075", "metadata": {}, "source": [ - "Since we could use the Training of the Tuning Step to create the model, we'll define this constant to indicate which approach we want to run." + "Since we could use the Training of the Tuning Step to create the model, we'll define this constant to indicate which approach we want to run.\n" ] }, { "cell_type": "code", - "execution_count": 34, + "execution_count": 110, "id": "f367d0e3", "metadata": {}, "outputs": [], @@ -2094,13 +2110,13 @@ "\n", "Finally, you can control the number of jobs and how many of them will run in parallel using the following two arguments:\n", "\n", - "* `max_jobs`: Defines the maximum total number of training jobs to start for the hyperparameter tuning job.\n", - "* `max_parallel_jobs`: Defines the maximum number of parallel training jobs to start." + "- `max_jobs`: Defines the maximum total number of training jobs to start for the hyperparameter tuning job.\n", + "- `max_parallel_jobs`: Defines the maximum number of parallel training jobs to start.\n" ] }, { "cell_type": "code", - "execution_count": 35, + "execution_count": 111, "id": "c8c82750", "metadata": {}, "outputs": [], @@ -2110,14 +2126,12 @@ "\n", "tuner = HyperparameterTuner(\n", " estimator,\n", - " objective_metric_name = \"val_accuracy\",\n", + " objective_metric_name=\"val_accuracy\",\n", " objective_type=\"Maximize\",\n", - " hyperparameter_ranges = {\n", + " hyperparameter_ranges={\n", " \"epochs\": IntegerParameter(10, 50),\n", " },\n", - " metric_definitions = [\n", - " {\"Name\": \"val_accuracy\", \"Regex\": \"val_accuracy: ([0-9\\\\.]+)\"}\n", - " ],\n", + " metric_definitions=[{\"Name\": \"val_accuracy\", \"Regex\": \"val_accuracy: ([0-9\\\\.]+)\"}],\n", " max_jobs=3,\n", " max_parallel_jobs=3,\n", ")" @@ -2128,12 +2142,12 @@ "id": "28c2abc2", "metadata": {}, "source": [ - "We can now create the Tuning Step using the tuner we configured before:" + "We can now create the Tuning Step using the tuner we configured before:\n" ] }, { "cell_type": "code", - "execution_count": 36, + "execution_count": 112, "id": "038ff2e5-ed28-445b-bc03-4e996ec2286f", "metadata": { "tags": [] @@ -2143,24 +2157,24 @@ "from sagemaker.workflow.steps import TuningStep\n", "\n", "tune_model_step = TuningStep(\n", - " name = \"tune-model\",\n", + " name=\"tune-model\",\n", " step_args=tuner.fit(\n", " inputs={\n", " \"train\": TrainingInput(\n", " s3_data=split_and_transform_data_step.properties.ProcessingOutputConfig.Outputs[\n", " \"train\"\n", " ].S3Output.S3Uri,\n", - " content_type=\"text/csv\"\n", + " content_type=\"text/csv\",\n", " ),\n", " \"validation\": TrainingInput(\n", " s3_data=split_and_transform_data_step.properties.ProcessingOutputConfig.Outputs[\n", " \"validation\"\n", " ].S3Output.S3Uri,\n", - " content_type=\"text/csv\"\n", - " )\n", + " content_type=\"text/csv\",\n", + " ),\n", " },\n", " ),\n", - " cache_config=cache_config\n", + " cache_config=cache_config,\n", ")" ] }, @@ -2171,12 +2185,12 @@ "source": [ "### Step 4 - Creating the Pipeline\n", "\n", - "Let's define the SageMaker Pipeline and submit its definition to the SageMaker Pipelines service to create the pipeline if it doesn't exist or update it if it does." + "Let's define the SageMaker Pipeline and submit its definition to the SageMaker Pipelines service to create the pipeline if it doesn't exist or update it if it does.\n" ] }, { "cell_type": "code", - "execution_count": 37, + "execution_count": 113, "id": "9799ab39-fcae-41f4-a68b-85ab71b3ba9a", "metadata": { "tags": [] @@ -2214,29 +2228,27 @@ "data": { "text/plain": [ "{'PipelineArn': 'arn:aws:sagemaker:us-east-1:325223348818:pipeline/session2-pipeline',\n", - " 'ResponseMetadata': {'RequestId': 'd641c63b-507f-4954-907e-80485a3cfbe6',\n", + " 'ResponseMetadata': {'RequestId': '217c11ad-0e92-4e32-a1b3-e1c8cc6e8f82',\n", " 'HTTPStatusCode': 200,\n", - " 'HTTPHeaders': {'x-amzn-requestid': 'd641c63b-507f-4954-907e-80485a3cfbe6',\n", + " 'HTTPHeaders': {'x-amzn-requestid': '217c11ad-0e92-4e32-a1b3-e1c8cc6e8f82',\n", " 'content-type': 'application/x-amz-json-1.1',\n", " 'content-length': '85',\n", - " 'date': 'Sat, 21 Oct 2023 16:01:50 GMT'},\n", + " 'date': 'Mon, 23 Oct 2023 15:47:15 GMT'},\n", " 'RetryAttempts': 0}}" ] }, - "execution_count": 37, + "execution_count": 113, "metadata": {}, "output_type": "execute_result" } ], "source": [ - "#| code: true\n", - "#| output: false\n", + "# | code: true\n", + "# | output: false\n", "\n", "session2_pipeline = Pipeline(\n", " name=\"session2-pipeline\",\n", - " parameters=[\n", - " dataset_location\n", - " ],\n", + " parameters=[dataset_location],\n", " steps=[\n", " split_and_transform_data_step,\n", " tune_model_step if USE_TUNING_STEP else train_model_step,\n", @@ -2253,7 +2265,7 @@ "id": "50810a3e", "metadata": {}, "source": [ - "We can now start the pipeline:" + "We can now start the pipeline:\n" ] }, { @@ -2265,12 +2277,12 @@ "\n", "
Note: \n", " The %%script cell magic is a convenient way to prevent the notebook from executing a specific cell. If you want to run the cell, comment out the line containing the %%script cell magic.\n", - "
" + "\n" ] }, { "cell_type": "code", - "execution_count": 38, + "execution_count": 114, "id": "274a9b1e", "metadata": {}, "outputs": [], @@ -2291,15 +2303,15 @@ "source": [ "### Assignments\n", "\n", - "* Assignment 2.1 The training script trains the model using a hard-coded learning rate value. Modify the code to accept the learning rate as a parameter we can control from outside the script.\n", + "- Assignment 2.1 The training script trains the model using a hard-coded learning rate value. Modify the code to accept the learning rate as a parameter we can control from outside the script.\n", "\n", - "* Assignment 2.2 We currently define the number of epochs to train the model as a constant that we pass to the Estimator using the list of hyperparameters. Replace this constant with a new Pipeline Parameter named `training_epochs`. You'll need to specify this new parameter when creating the Pipeline.\n", + "- Assignment 2.2 We currently define the number of epochs to train the model as a constant that we pass to the Estimator using the list of hyperparameters. Replace this constant with a new Pipeline Parameter named `training_epochs`. You'll need to specify this new parameter when creating the Pipeline.\n", "\n", - "* Assignment 2.3 The current tuning process aims to find the model with the highest validation accuracy. Modify the code to focus on the model with the lowest training loss.\n", + "- Assignment 2.3 The current tuning process aims to find the model with the highest validation accuracy. Modify the code to focus on the model with the lowest training loss.\n", "\n", - "* Assignment 2.4 We used an instance of [`SKLearnProcessor`](https://sagemaker.readthedocs.io/en/stable/frameworks/sklearn/sagemaker.sklearn.html#scikit-learn-processor) to run the script that transforms and splits the data, but there's no way to add additional dependencies to the processing container. Modify the code to use an instance of [`FrameworkProcessor`](https://sagemaker.readthedocs.io/en/stable/api/training/processing.html#sagemaker.processing.FrameworkProcessor) instead. This class will allow you to specify a directory containing a `requirements.txt` file containing a list of dependencies. SageMaker will install these libraries in the processing container before triggering the processing job.\n", + "- Assignment 2.4 We used an instance of [`SKLearnProcessor`](https://sagemaker.readthedocs.io/en/stable/frameworks/sklearn/sagemaker.sklearn.html#scikit-learn-processor) to run the script that transforms and splits the data, but there's no way to add additional dependencies to the processing container. Modify the code to use an instance of [`FrameworkProcessor`](https://sagemaker.readthedocs.io/en/stable/api/training/processing.html#sagemaker.processing.FrameworkProcessor) instead. This class will allow you to specify a directory containing a `requirements.txt` file containing a list of dependencies. SageMaker will install these libraries in the processing container before triggering the processing job.\n", "\n", - "* Assignment 2.5 We want to execute the pipeline whenever the dataset changes. We can accomplish this by using [Amazon EventBridge](https://docs.aws.amazon.com/eventbridge/latest/userguide/eb-what-is.html). Configure an event to automatically start the pipeline when a new file is added to the S3 bucket where we store our dataset. Check [Amazon EventBridge Integration](https://docs.aws.amazon.com/sagemaker/latest/dg/pipeline-eventbridge.html) for an implementation tutorial." + "- Assignment 2.5 We want to execute the pipeline whenever the dataset changes. We can accomplish this by using [Amazon EventBridge](https://docs.aws.amazon.com/eventbridge/latest/userguide/eb-what-is.html). Configure an event to automatically start the pipeline when a new file is added to the S3 bucket where we store our dataset. Check [Amazon EventBridge Integration](https://docs.aws.amazon.com/sagemaker/latest/dg/pipeline-eventbridge.html) for an implementation tutorial.\n" ] }, { @@ -2309,9 +2321,9 @@ "source": [ "## Session 3 - Evaluating and Versioning Models\n", "\n", - "This session extends the [SageMaker Pipeline](https://docs.aws.amazon.com/sagemaker/latest/dg/pipelines-sdk.html) with a step to evaluate the model and register it if it reaches a predefined accuracy threshold. \n", + "This session extends the [SageMaker Pipeline](https://docs.aws.amazon.com/sagemaker/latest/dg/pipelines-sdk.html) with a step to evaluate the model and register it if it reaches a predefined accuracy threshold.\n", "\n", - "We'll use a [Processing Step](https://docs.aws.amazon.com/sagemaker/latest/dg/build-and-manage-steps.html#step-type-processing) to execute an evaluation script. We'll use a [Condition Step](https://docs.aws.amazon.com/sagemaker/latest/dg/build-and-manage-steps.html#step-type-condition) to determine whether the model's accuracy is above a threshold, and a [Model Step](https://docs.aws.amazon.com/sagemaker/latest/dg/build-and-manage-steps.html#step-type-model) to register the model in the [SageMaker Model Registry](https://docs.aws.amazon.com/sagemaker/latest/dg/model-registry.html)." + "We'll use a [Processing Step](https://docs.aws.amazon.com/sagemaker/latest/dg/build-and-manage-steps.html#step-type-processing) to execute an evaluation script. We'll use a [Condition Step](https://docs.aws.amazon.com/sagemaker/latest/dg/build-and-manage-steps.html#step-type-condition) to determine whether the model's accuracy is above a threshold, and a [Model Step](https://docs.aws.amazon.com/sagemaker/latest/dg/build-and-manage-steps.html#step-type-model) to register the model in the [SageMaker Model Registry](https://docs.aws.amazon.com/sagemaker/latest/dg/model-registry.html).\n" ] }, { @@ -2323,12 +2335,12 @@ "source": [ "### Step 1 - Creating the Evaluation Script\n", "\n", - "Let's create the evaluation script. The Processing Step will spin up a Processing Job and run this script inside a container. This script is responsible for loading the model we created and evaluating it on the test set. Before finishing, this script will generate an evaluation report of the model." + "Let's create the evaluation script. The Processing Step will spin up a Processing Job and run this script inside a container. This script is responsible for loading the model we created and evaluating it on the test set. Before finishing, this script will generate an evaluation report of the model.\n" ] }, { "cell_type": "code", - "execution_count": 39, + "execution_count": 115, "id": "3ee3ab26-afa5-4ceb-9f7a-005d5fdea646", "metadata": { "tags": [] @@ -2344,7 +2356,6 @@ ], "source": [ "%%writefile {CODE_FOLDER}/evaluation.py\n", - "\n", "#| label: evaluation-script\n", "#| echo: true\n", "#| output: false\n", @@ -2410,12 +2421,12 @@ "id": "9dcc79a0-adfd-4ce9-8580-5cd228c3c2d9", "metadata": {}, "source": [ - "Let's test the script to ensure everything is working as expected:" + "Let's test the script to ensure everything is working as expected:\n" ] }, { "cell_type": "code", - "execution_count": 40, + "execution_count": 116, "id": "9a2540d8-278a-4953-bc54-0469d154427d", "metadata": { "tags": [] @@ -2425,23 +2436,24 @@ "name": "stderr", "output_type": "stream", "text": [ - "WARNING:absl:At this time, the v2.11+ optimizer `tf.keras.optimizers.SGD` runs slowly on M1/M2 Macs, please use the legacy Keras optimizer instead, located at `tf.keras.optimizers.legacy.SGD`.\n" + "WARNING:absl:At this time, the v2.11+ optimizer `tf.keras.optimizers.SGD` runs slowly on M1/M2 Macs, please use the legacy Keras optimizer instead, located at `tf.keras.optimizers.legacy.SGD`.\n", + "WARNING:tensorflow:5 out of the last 9 calls to .test_function at 0x316c00790> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating @tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your @tf.function outside of the loop. For (2), @tf.function has reduce_retracing=True option that can avoid unnecessary retracing. For (3), please refer to https://www.tensorflow.org/guide/function#controlling_retracing and https://www.tensorflow.org/api_docs/python/tf/function for more details.\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ - "8/8 - 0s - loss: 1.2052 - accuracy: 0.1333 - val_loss: 1.2150 - val_accuracy: 0.0577 - 236ms/epoch - 30ms/step\n", - "2/2 [==============================] - 0s 1ms/step\n", - "Validation accuracy: 0.057692307692307696\n" + "8/8 - 0s - loss: 1.0665 - accuracy: 0.4979 - val_loss: 1.0408 - val_accuracy: 0.5490 - 201ms/epoch - 25ms/step\n", + "2/2 [==============================] - 0s 2ms/step\n", + "Validation accuracy: 0.5490196078431373\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ - "INFO:tensorflow:Assets written to: /var/folders/4c/v1q3hy1x4mb5w0wpc72zl3_w0000gp/T/tmp78cm9r5t/model/001/assets\n", + "INFO:tensorflow:Assets written to: /var/folders/4c/v1q3hy1x4mb5w0wpc72zl3_w0000gp/T/tmpwon_erd4/model/001/assets\n", "WARNING:absl:At this time, the v2.11+ optimizer `tf.keras.optimizers.RestoredOptimizer` runs slowly on M1/M2 Macs, please use the legacy Keras optimizer instead, located at `tf.keras.optimizers.legacy.RestoredOptimizer`.\n", "WARNING:absl:At this time, the v2.11+ optimizer `tf.keras.optimizers.SGD` runs slowly on M1/M2 Macs, please use the legacy Keras optimizer instead, located at `tf.keras.optimizers.legacy.SGD`.\n" ] @@ -2450,8 +2462,8 @@ "name": "stdout", "output_type": "stream", "text": [ - "2/2 [==============================] - 0s 1ms/step\n", - "Test accuracy: 0.09803921568627451\n", + "2/2 [==============================] - 0s 2ms/step\n", + "Test accuracy: 0.5098039215686274\n", "\u001b[32m.\u001b[0m" ] }, @@ -2459,26 +2471,26 @@ "name": "stderr", "output_type": "stream", "text": [ - "WARNING:absl:At this time, the v2.11+ optimizer `tf.keras.optimizers.SGD` runs slowly on M1/M2 Macs, please use the legacy Keras optimizer instead, located at `tf.keras.optimizers.legacy.SGD`.\n" + "WARNING:absl:At this time, the v2.11+ optimizer `tf.keras.optimizers.SGD` runs slowly on M1/M2 Macs, please use the legacy Keras optimizer instead, located at `tf.keras.optimizers.legacy.SGD`.\n", + "WARNING:tensorflow:6 out of the last 11 calls to .test_function at 0x317daa1f0> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating @tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your @tf.function outside of the loop. For (2), @tf.function has reduce_retracing=True option that can avoid unnecessary retracing. For (3), please refer to https://www.tensorflow.org/guide/function#controlling_retracing and https://www.tensorflow.org/api_docs/python/tf/function for more details.\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ - "8/8 - 0s - loss: 1.1592 - accuracy: 0.2875 - val_loss: 1.1438 - val_accuracy: 0.2500 - 234ms/epoch - 29ms/step\n", - "2/2 [==============================] - 0s 2ms/step\n", - "Validation accuracy: 0.25\n" + "8/8 - 0s - loss: 1.0439 - accuracy: 0.3556 - val_loss: 1.0574 - val_accuracy: 0.3137 - 224ms/epoch - 28ms/step\n", + "2/2 [==============================] - 0s 1ms/step\n", + "Validation accuracy: 0.3137254901960784\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ - "INFO:tensorflow:Assets written to: /var/folders/4c/v1q3hy1x4mb5w0wpc72zl3_w0000gp/T/tmpkbpnosb3/model/001/assets\n", + "INFO:tensorflow:Assets written to: /var/folders/4c/v1q3hy1x4mb5w0wpc72zl3_w0000gp/T/tmpv1zttgh2/model/001/assets\n", "WARNING:absl:At this time, the v2.11+ optimizer `tf.keras.optimizers.RestoredOptimizer` runs slowly on M1/M2 Macs, please use the legacy Keras optimizer instead, located at `tf.keras.optimizers.legacy.RestoredOptimizer`.\n", - "WARNING:absl:At this time, the v2.11+ optimizer `tf.keras.optimizers.SGD` runs slowly on M1/M2 Macs, please use the legacy Keras optimizer instead, located at `tf.keras.optimizers.legacy.SGD`.\n", - "WARNING:tensorflow:5 out of the last 9 calls to .predict_function at 0x2e01b2e50> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating @tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your @tf.function outside of the loop. For (2), @tf.function has reduce_retracing=True option that can avoid unnecessary retracing. For (3), please refer to https://www.tensorflow.org/guide/function#controlling_retracing and https://www.tensorflow.org/api_docs/python/tf/function for more details.\n" + "WARNING:absl:At this time, the v2.11+ optimizer `tf.keras.optimizers.SGD` runs slowly on M1/M2 Macs, please use the legacy Keras optimizer instead, located at `tf.keras.optimizers.legacy.SGD`.\n" ] }, { @@ -2486,9 +2498,9 @@ "output_type": "stream", "text": [ "2/2 [==============================] - 0s 1ms/step\n", - "Test accuracy: 0.29411764705882354\n", + "Test accuracy: 0.3137254901960784\n", "\u001b[32m.\u001b[0m\n", - "\u001b[32m\u001b[32m\u001b[1m2 passed\u001b[0m\u001b[32m in 1.32s\u001b[0m\u001b[0m\n" + "\u001b[32m\u001b[32m\u001b[1m2 passed\u001b[0m\u001b[32m in 1.25s\u001b[0m\u001b[0m\n" ] } ], @@ -2567,12 +2579,12 @@ "source": [ "### Step 2 - Setting up the Evaluation Step\n", "\n", - "To run the evaluation script, we will use a [Processing Step](https://docs.aws.amazon.com/sagemaker/latest/dg/build-and-manage-steps.html#step-type-processing) configured with [TensorFlowProcessor](https://docs.aws.amazon.com/sagemaker/latest/dg/processing-job-frameworks-tensorflow.html) because the script needs access to TensorFlow. " + "To run the evaluation script, we will use a [Processing Step](https://docs.aws.amazon.com/sagemaker/latest/dg/build-and-manage-steps.html#step-type-processing) configured with [TensorFlowProcessor](https://docs.aws.amazon.com/sagemaker/latest/dg/processing-job-frameworks-tensorflow.html) because the script needs access to TensorFlow.\n" ] }, { "cell_type": "code", - "execution_count": 41, + "execution_count": 117, "id": "2fdff07f", "metadata": {}, "outputs": [ @@ -2585,8 +2597,8 @@ } ], "source": [ - "#| code: true\n", - "#| output: false\n", + "# | code: true\n", + "# | output: false\n", "\n", "from sagemaker.tensorflow import TensorFlowProcessor\n", "\n", @@ -2607,12 +2619,12 @@ "id": "419e354a", "metadata": {}, "source": [ - "One of the inputs to the Evaluation Step will be the model assets. We can use the `USE_TUNING_STEP` flag to determine whether we created the model using a Training Step or a Tuning Step. In case we are using the Tuning Step, we can use the [TuningStep.get_top_model_s3_uri()](https://sagemaker.readthedocs.io/en/stable/workflows/pipelines/sagemaker.workflow.pipelines.html#sagemaker.workflow.steps.TuningStep.get_top_model_s3_uri) function to get the model assets from the top performing training job of the Hyperparameter Tuning Job." + "One of the inputs to the Evaluation Step will be the model assets. We can use the `USE_TUNING_STEP` flag to determine whether we created the model using a Training Step or a Tuning Step. In case we are using the Tuning Step, we can use the [TuningStep.get_top_model_s3_uri()](https://sagemaker.readthedocs.io/en/stable/workflows/pipelines/sagemaker.workflow.pipelines.html#sagemaker.workflow.steps.TuningStep.get_top_model_s3_uri) function to get the model assets from the top performing training job of the Hyperparameter Tuning Job.\n" ] }, { "cell_type": "code", - "execution_count": 42, + "execution_count": 118, "id": "4f19e15b", "metadata": {}, "outputs": [], @@ -2628,12 +2640,12 @@ "id": "08dae772", "metadata": {}, "source": [ - "SageMaker supports mapping outputs to property files. This is useful when accessing a specific property from the pipeline. In our case, we want to access the accuracy of the model in the Condition Step, so we'll map the evaluation report to a property file. Check [How to Build and Manage Property Files](https://docs.aws.amazon.com/sagemaker/latest/dg/build-and-manage-propertyfile.html) for more information." + "SageMaker supports mapping outputs to property files. This is useful when accessing a specific property from the pipeline. In our case, we want to access the accuracy of the model in the Condition Step, so we'll map the evaluation report to a property file. Check [How to Build and Manage Property Files](https://docs.aws.amazon.com/sagemaker/latest/dg/build-and-manage-propertyfile.html) for more information.\n" ] }, { "cell_type": "code", - "execution_count": 43, + "execution_count": 119, "id": "1f27b2ef", "metadata": {}, "outputs": [], @@ -2650,12 +2662,12 @@ "id": "4a4dbc0e", "metadata": {}, "source": [ - "We are now ready to define the [ProcessingStep](https://sagemaker.readthedocs.io/en/stable/workflows/pipelines/sagemaker.workflow.pipelines.html#sagemaker.workflow.steps.ProcessingStep) that will run the evaluation script:" + "We are now ready to define the [ProcessingStep](https://sagemaker.readthedocs.io/en/stable/workflows/pipelines/sagemaker.workflow.pipelines.html#sagemaker.workflow.steps.ProcessingStep) that will run the evaluation script:\n" ] }, { "cell_type": "code", - "execution_count": 44, + "execution_count": 120, "id": "48139a07-5c8e-4bc6-b666-bf9531f7f520", "metadata": { "tags": [] @@ -2665,10 +2677,8 @@ "name": "stderr", "output_type": "stream", "text": [ - "/Users/svpino/dev/ml.school/.venv/lib/python3.9/site-packages/sagemaker/workflow/pipeline_context.py:297: UserWarning:\n", - "\n", - "Running within a PipelineSession, there will be No Wait, No Logs, and No Job being started.\n", - "\n" + "/Users/svpino/dev/ml.school/.venv/lib/python3.9/site-packages/sagemaker/workflow/pipeline_context.py:297: UserWarning: Running within a PipelineSession, there will be No Wait, No Logs, and No Job being started.\n", + " warnings.warn(\n" ] } ], @@ -2719,12 +2729,12 @@ "\n", "Let's now create a new version of the model and register it in the Model Registry. Check [Register a Model Version](https://docs.aws.amazon.com/sagemaker/latest/dg/model-registry-version.html) for more information about model registration.\n", "\n", - "First, let's define the name of the group where we'll register the model:" + "First, let's define the name of the group where we'll register the model:\n" ] }, { "cell_type": "code", - "execution_count": 45, + "execution_count": 121, "id": "bb70f907", "metadata": {}, "outputs": [], @@ -2737,12 +2747,12 @@ "id": "40bcad3b", "metadata": {}, "source": [ - "Let's now create the model that we'll register in the Model Registry. The model we trained uses TensorFlow, so we can use the built-in [TensorFlowModel](https://sagemaker.readthedocs.io/en/stable/frameworks/tensorflow/sagemaker.tensorflow.html#tensorflow-serving-model) class to create an instance of the model:" + "Let's now create the model that we'll register in the Model Registry. The model we trained uses TensorFlow, so we can use the built-in [TensorFlowModel](https://sagemaker.readthedocs.io/en/stable/frameworks/tensorflow/sagemaker.tensorflow.html#tensorflow-serving-model) class to create an instance of the model:\n" ] }, { "cell_type": "code", - "execution_count": 46, + "execution_count": 122, "id": "4ca4cb61", "metadata": {}, "outputs": [], @@ -2763,12 +2773,12 @@ "id": "99d6fd00", "metadata": {}, "source": [ - "When we register a model in the Model Registry, we can attach relevant metadata to it. We'll use the evaluation report we generated during the Evaluation Step to populate the [metrics](https://sagemaker.readthedocs.io/en/stable/api/inference/model_monitor.html#sagemaker.model_metrics.ModelMetrics) of this model:" + "When we register a model in the Model Registry, we can attach relevant metadata to it. We'll use the evaluation report we generated during the Evaluation Step to populate the [metrics](https://sagemaker.readthedocs.io/en/stable/api/inference/model_monitor.html#sagemaker.model_metrics.ModelMetrics) of this model:\n" ] }, { "cell_type": "code", - "execution_count": 47, + "execution_count": 123, "id": "8c05a7e1", "metadata": {}, "outputs": [], @@ -2797,13 +2807,12 @@ "id": "6a51e61d", "metadata": {}, "source": [ - "\n", - "We can use a [Model Step](https://docs.aws.amazon.com/sagemaker/latest/dg/build-and-manage-steps.html#step-type-model) to register the model. Check the [ModelStep](https://sagemaker.readthedocs.io/en/stable/workflows/pipelines/sagemaker.workflow.pipelines.html#sagemaker.workflow.model_step.ModelStep) SageMaker's SDK documentation for more information." + "We can use a [Model Step](https://docs.aws.amazon.com/sagemaker/latest/dg/build-and-manage-steps.html#step-type-model) to register the model. Check the [ModelStep](https://sagemaker.readthedocs.io/en/stable/workflows/pipelines/sagemaker.workflow.pipelines.html#sagemaker.workflow.model_step.ModelStep) SageMaker's SDK documentation for more information.\n" ] }, { "cell_type": "code", - "execution_count": 48, + "execution_count": 124, "id": "c9773a4a", "metadata": { "tags": [] @@ -2831,7 +2840,6 @@ " model_metrics=model_metrics,\n", " content_types=[\"text/csv\"],\n", " response_types=[\"text/csv\"],\n", - " \n", " # This is the suggested inference instance types when\n", " # deploying the model or using it as part of a batch\n", " # transform job.\n", @@ -2852,7 +2860,7 @@ "source": [ "### Step 4 - Setting up a Condition Step\n", "\n", - "We only want to register a new model if its accuracy exceeds a predefined threshold. We can use a [Condition Step](https://docs.aws.amazon.com/sagemaker/latest/dg/build-and-manage-steps.html#step-type-condition) together with the evaluation report we generated to accomplish this. " + "We only want to register a new model if its accuracy exceeds a predefined threshold. We can use a [Condition Step](https://docs.aws.amazon.com/sagemaker/latest/dg/build-and-manage-steps.html#step-type-condition) together with the evaluation report we generated to accomplish this.\n" ] }, { @@ -2860,12 +2868,12 @@ "id": "b5a51f95", "metadata": {}, "source": [ - "Let's define a new [Pipeline Parameter](https://docs.aws.amazon.com/sagemaker/latest/dg/build-and-manage-parameters.html) to specify the minimum accuracy that the model should reach for it to be registered." + "Let's define a new [Pipeline Parameter](https://docs.aws.amazon.com/sagemaker/latest/dg/build-and-manage-parameters.html) to specify the minimum accuracy that the model should reach for it to be registered.\n" ] }, { "cell_type": "code", - "execution_count": 49, + "execution_count": 125, "id": "745486b5", "metadata": {}, "outputs": [], @@ -2880,12 +2888,12 @@ "id": "2c959c94", "metadata": {}, "source": [ - "If the model's accuracy is not greater than or equal our threshold, we will send the pipeline to a [Fail Step](https://docs.aws.amazon.com/sagemaker/latest/dg/build-and-manage-steps.html#step-type-fail) with the appropriate error message. Check the [FailStep](https://sagemaker.readthedocs.io/en/stable/workflows/pipelines/sagemaker.workflow.pipelines.html#sagemaker.workflow.fail_step.FailStep) SageMaker's SDK documentation for more information." + "If the model's accuracy is not greater than or equal our threshold, we will send the pipeline to a [Fail Step](https://docs.aws.amazon.com/sagemaker/latest/dg/build-and-manage-steps.html#step-type-fail) with the appropriate error message. Check the [FailStep](https://sagemaker.readthedocs.io/en/stable/workflows/pipelines/sagemaker.workflow.pipelines.html#sagemaker.workflow.fail_step.FailStep) SageMaker's SDK documentation for more information.\n" ] }, { "cell_type": "code", - "execution_count": 50, + "execution_count": 126, "id": "c4431bbf", "metadata": {}, "outputs": [], @@ -2909,12 +2917,12 @@ "id": "b47764f9", "metadata": {}, "source": [ - "We can use a [ConditionGreaterThanOrEqualTo](https://sagemaker.readthedocs.io/en/stable/workflows/pipelines/sagemaker.workflow.pipelines.html#sagemaker.workflow.conditions.ConditionGreaterThanOrEqualTo) condition to compare the model's accuracy with the threshold. Look at the [Conditions](https://sagemaker.readthedocs.io/en/stable/amazon_sagemaker_model_building_pipeline.html#conditions) section in the documentation for more information about the types of supported conditions." + "We can use a [ConditionGreaterThanOrEqualTo](https://sagemaker.readthedocs.io/en/stable/workflows/pipelines/sagemaker.workflow.pipelines.html#sagemaker.workflow.conditions.ConditionGreaterThanOrEqualTo) condition to compare the model's accuracy with the threshold. Look at the [Conditions](https://sagemaker.readthedocs.io/en/stable/amazon_sagemaker_model_building_pipeline.html#conditions) section in the documentation for more information about the types of supported conditions.\n" ] }, { "cell_type": "code", - "execution_count": 51, + "execution_count": 127, "id": "bebeecab", "metadata": {}, "outputs": [], @@ -2937,12 +2945,12 @@ "id": "1b0ce4b1", "metadata": {}, "source": [ - "Let's now define the Condition Step:" + "Let's now define the Condition Step:\n" ] }, { "cell_type": "code", - "execution_count": 52, + "execution_count": 128, "id": "36e2a2b1-6711-4266-95d8-d2aebd52e199", "metadata": { "tags": [] @@ -2966,12 +2974,12 @@ "source": [ "### Step 5 - Creating the Pipeline\n", "\n", - "We can now define the SageMaker Pipeline and submit its definition to the SageMaker Pipelines service to create the pipeline if it doesn't exist or update it if it does." + "We can now define the SageMaker Pipeline and submit its definition to the SageMaker Pipelines service to create the pipeline if it doesn't exist or update it if it does.\n" ] }, { "cell_type": "code", - "execution_count": 53, + "execution_count": 129, "id": "f70bcd33-b499-4e2b-953e-94d1ed96c10a", "metadata": { "tags": [] @@ -2988,7 +2996,6 @@ "name": "stdout", "output_type": "stream", "text": [ - "Using provided s3_resource\n", "Using provided s3_resource\n" ] }, @@ -2996,7 +3003,20 @@ "name": "stderr", "output_type": "stream", "text": [ - "INFO:sagemaker.processing:Uploaded None to s3://mlschool/session3-pipeline/code/023576aa7e5c5a7eb833b29794f54112/sourcedir.tar.gz\n", + "INFO:sagemaker.processing:Uploaded None to s3://mlschool/session3-pipeline/code/09fea667a5ab7c37a068f22c00762d0b/sourcedir.tar.gz\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Using provided s3_resource\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ "INFO:sagemaker.processing:runproc.sh uploaded to s3://mlschool/session3-pipeline/code/2c207c809cb0e0e9a1d77e5247f961f9/runproc.sh\n", "WARNING:sagemaker.workflow._utils:Popping out 'CertifyForMarketplace' from the pipeline definition since it will be overridden in pipeline execution time.\n" ] @@ -3020,7 +3040,7 @@ "name": "stderr", "output_type": "stream", "text": [ - "INFO:sagemaker.processing:Uploaded None to s3://mlschool/session3-pipeline/code/023576aa7e5c5a7eb833b29794f54112/sourcedir.tar.gz\n", + "INFO:sagemaker.processing:Uploaded None to s3://mlschool/session3-pipeline/code/09fea667a5ab7c37a068f22c00762d0b/sourcedir.tar.gz\n", "INFO:sagemaker.processing:runproc.sh uploaded to s3://mlschool/session3-pipeline/code/2c207c809cb0e0e9a1d77e5247f961f9/runproc.sh\n" ] }, @@ -3028,35 +3048,32 @@ "data": { "text/plain": [ "{'PipelineArn': 'arn:aws:sagemaker:us-east-1:325223348818:pipeline/session3-pipeline',\n", - " 'ResponseMetadata': {'RequestId': '17ef4376-c298-4cc3-84bb-8163f572c46f',\n", + " 'ResponseMetadata': {'RequestId': '2c887422-afbc-4c61-893a-88a069720aab',\n", " 'HTTPStatusCode': 200,\n", - " 'HTTPHeaders': {'x-amzn-requestid': '17ef4376-c298-4cc3-84bb-8163f572c46f',\n", + " 'HTTPHeaders': {'x-amzn-requestid': '2c887422-afbc-4c61-893a-88a069720aab',\n", " 'content-type': 'application/x-amz-json-1.1',\n", " 'content-length': '85',\n", - " 'date': 'Sat, 21 Oct 2023 16:01:53 GMT'},\n", + " 'date': 'Mon, 23 Oct 2023 15:47:19 GMT'},\n", " 'RetryAttempts': 0}}" ] }, - "execution_count": 53, + "execution_count": 129, "metadata": {}, "output_type": "execute_result" } ], "source": [ - "#| code: true\n", - "#| output: false\n", + "# | code: true\n", + "# | output: false\n", "\n", "session3_pipeline = Pipeline(\n", " name=\"session3-pipeline\",\n", - " parameters=[\n", - " dataset_location,\n", - " accuracy_threshold\n", - " ],\n", + " parameters=[dataset_location, accuracy_threshold],\n", " steps=[\n", " split_and_transform_data_step,\n", " tune_model_step if USE_TUNING_STEP else train_model_step,\n", " evaluate_model_step,\n", - " condition_step\n", + " condition_step,\n", " ],\n", " pipeline_definition_config=pipeline_definition_config,\n", " sagemaker_session=config[\"session\"],\n", @@ -3070,7 +3087,7 @@ "id": "1b1f656e", "metadata": {}, "source": [ - "We can now start the pipeline:" + "We can now start the pipeline:\n" ] }, { @@ -3082,12 +3099,12 @@ "\n", "
Note: \n", " The %%script cell magic is a convenient way to prevent the notebook from executing a specific cell. If you want to run the cell, comment out the line containing the %%script cell magic.\n", - "
" + "\n" ] }, { "cell_type": "code", - "execution_count": 54, + "execution_count": 130, "id": "f3b4126e", "metadata": {}, "outputs": [], @@ -3108,15 +3125,15 @@ "source": [ "### Assignments\n", "\n", - "* Assignment 3.1 The evaluation script computes the accuracy of the model and exports it as part of the evaluation report. Extend the evaluation report by adding the precision and the recall of the model on each one of the classes.\n", + "- Assignment 3.1 The evaluation script computes the accuracy of the model and exports it as part of the evaluation report. Extend the evaluation report by adding the precision and the recall of the model on each one of the classes.\n", "\n", - "* Assignment 3.2 Extend the evaluation script to test the model on each island separately. The evaluation report should contain the accuracy of the model on each island and the overall accuracy.\n", + "- Assignment 3.2 Extend the evaluation script to test the model on each island separately. The evaluation report should contain the accuracy of the model on each island and the overall accuracy.\n", "\n", - "* Assignment 3.3 The Condition Step uses a hard-coded threshold value to determine if the model's accuracy is good enough to proceed. Modify the code so the pipeline uses the accuracy of the latest registered model version as the threshold. We want to register a new model version only if its performance is better than the previous version we registered.\n", + "- Assignment 3.3 The Condition Step uses a hard-coded threshold value to determine if the model's accuracy is good enough to proceed. Modify the code so the pipeline uses the accuracy of the latest registered model version as the threshold. We want to register a new model version only if its performance is better than the previous version we registered.\n", "\n", - "* Assignment 3.4 The current pipeline uses either a Training Step or a Tuning Step to build a model. Modify the pipeline to use both steps at the same time. The evaluation script should evaluate the model coming from the Training Step and the best model coming from the Tuning Step and output the accuracy and location in S3 of the best model. You should modify the code to register the model assets specified in the evaluation report.\n", + "- Assignment 3.4 The current pipeline uses either a Training Step or a Tuning Step to build a model. Modify the pipeline to use both steps at the same time. The evaluation script should evaluate the model coming from the Training Step and the best model coming from the Tuning Step and output the accuracy and location in S3 of the best model. You should modify the code to register the model assets specified in the evaluation report.\n", "\n", - "* Assignment 3.5 Instead of running the entire pipeline from start to finish, sometimes you may only need to iterate over particular steps. SageMaker Pipelines supports [Selective Execution for Pipeline Steps](https://docs.aws.amazon.com/sagemaker/latest/dg/pipelines-selective-ex.html). In this assignment you will use Selective Execution to only run one specific step of the pipeline. [Unlocking efficiency: Harnessing the power of Selective Execution in Amazon SageMaker Pipelines](https://aws.amazon.com/blogs/machine-learning/unlocking-efficiency-harnessing-the-power-of-selective-execution-in-amazon-sagemaker-pipelines/) is a great article that explains this feature." + "- Assignment 3.5 Instead of running the entire pipeline from start to finish, sometimes you may only need to iterate over particular steps. SageMaker Pipelines supports [Selective Execution for Pipeline Steps](https://docs.aws.amazon.com/sagemaker/latest/dg/pipelines-selective-ex.html). In this assignment you will use Selective Execution to only run one specific step of the pipeline. [Unlocking efficiency: Harnessing the power of Selective Execution in Amazon SageMaker Pipelines](https://aws.amazon.com/blogs/machine-learning/unlocking-efficiency-harnessing-the-power-of-selective-execution-in-amazon-sagemaker-pipelines/) is a great article that explains this feature.\n" ] }, { @@ -3128,7 +3145,7 @@ "source": [ "## Session 4 - Deploying Models and Serving Predictions\n", "\n", - "In this session we'll explore how to deploy a model to a SageMaker Endpoint and how to use a SageMaker Inference Pipeline to control the data that goes in and comes out of the endpoint." + "In this session we'll explore how to deploy a model to a SageMaker Endpoint and how to use a SageMaker Inference Pipeline to control the data that goes in and comes out of the endpoint.\n" ] }, { @@ -3138,14 +3155,14 @@ "source": [ "### Step 1 - Deploying Model From Registry\n", "\n", - "Let's manually deploy the latest model from the Model Registry to an endpoint. \n", + "Let's manually deploy the latest model from the Model Registry to an endpoint.\n", "\n", - "Let's start by defining the name of the endpoint where we'll deploy the model:" + "Let's start by defining the name of the endpoint where we'll deploy the model:\n" ] }, { "cell_type": "code", - "execution_count": 76, + "execution_count": 131, "id": "2a116f93", "metadata": {}, "outputs": [], @@ -3160,12 +3177,12 @@ "id": "ae95f1d6", "metadata": {}, "source": [ - "We want to query the list of approved models from the Model Registry and get the last one:" + "We want to query the list of approved models from the Model Registry and get the last one:\n" ] }, { "cell_type": "code", - "execution_count": 56, + "execution_count": 132, "id": "87437a26-e9ea-4866-9dc3-630444c0fb46", "metadata": { "tags": [] @@ -3182,7 +3199,7 @@ " 'ModelApprovalStatus': 'Approved'}" ] }, - "execution_count": 56, + "execution_count": 132, "metadata": {}, "output_type": "execute_result" } @@ -3195,7 +3212,11 @@ " MaxResults=1,\n", ")\n", "\n", - "package = response[\"ModelPackageSummaryList\"][0] if response[\"ModelPackageSummaryList\"] else None\n", + "package = (\n", + " response[\"ModelPackageSummaryList\"][0]\n", + " if response[\"ModelPackageSummaryList\"]\n", + " else None\n", + ")\n", "package" ] }, @@ -3204,12 +3225,12 @@ "id": "af752269", "metadata": {}, "source": [ - "We can now create a [Model Package](https://sagemaker.readthedocs.io/en/stable/api/inference/model.html#sagemaker.model.ModelPackage) using the ARN of the model from the Model Registry:" + "We can now create a [Model Package](https://sagemaker.readthedocs.io/en/stable/api/inference/model.html#sagemaker.model.ModelPackage) using the ARN of the model from the Model Registry:\n" ] }, { "cell_type": "code", - "execution_count": 57, + "execution_count": 133, "id": "dee516e9", "metadata": {}, "outputs": [], @@ -3228,7 +3249,7 @@ "id": "b3119b48-2ddf-40b5-9ac0-680073a53d06", "metadata": {}, "source": [ - "Let's now deploy the model to an endpoint:" + "Let's now deploy the model to an endpoint:\n" ] }, { @@ -3240,12 +3261,12 @@ "\n", "
Note: \n", " The %%script cell magic is a convenient way to prevent the notebook from executing a specific cell. If you want to run the cell, comment out the line containing the %%script cell magic.\n", - "
" + "\n" ] }, { "cell_type": "code", - "execution_count": 58, + "execution_count": 134, "id": "7c8852d5-818a-406c-944d-30bf6de90288", "metadata": { "tags": [] @@ -3267,16 +3288,16 @@ "id": "3dd7a725", "metadata": {}, "source": [ - "After deploying the model, we can test the endpoint to make sure it works. \n", + "After deploying the model, we can test the endpoint to make sure it works.\n", "\n", "Each line of the payload we'll send to the endpoint contains the information of a penguin. Notice the model expects data that's already transformed. We can't provide the original data from our dataset because the model we registered will not work with it.\n", "\n", - "The endpoint will return the predictions for each of these lines." + "The endpoint will return the predictions for each of these lines.\n" ] }, { "cell_type": "code", - "execution_count": 59, + "execution_count": 135, "id": "ba7da291", "metadata": {}, "outputs": [], @@ -3293,7 +3314,7 @@ "id": "30bcfffa-0ba6-4ad8-8b4f-1ea19b35a22f", "metadata": {}, "source": [ - "Let's send the payload to the endpoint and print its response:" + "Let's send the payload to the endpoint and print its response:\n" ] }, { @@ -3305,12 +3326,12 @@ "\n", "
Note: \n", " The %%script cell magic is a convenient way to prevent the notebook from executing a specific cell. If you want to run the cell, comment out the line containing the %%script cell magic.\n", - "
" + "\n" ] }, { "cell_type": "code", - "execution_count": 60, + "execution_count": 136, "id": "0817a25e-8224-4911-830b-d659e7458b4a", "metadata": { "tags": [] @@ -3333,7 +3354,7 @@ "id": "28f5d383-fcd7-454c-bbd6-ce4ce7b2104a", "metadata": {}, "source": [ - "After testing the endpoint, we need to ensure we delete it:" + "After testing the endpoint, we need to ensure we delete it:\n" ] }, { @@ -3345,12 +3366,12 @@ "\n", "
Note: \n", " The %%script cell magic is a convenient way to prevent the notebook from executing a specific cell. If you want to run the cell, comment out the line containing the %%script cell magic.\n", - "
" + "\n" ] }, { "cell_type": "code", - "execution_count": 61, + "execution_count": 137, "id": "6b32c3a4-312e-473c-a217-33606f77d1e9", "metadata": { "tags": [] @@ -3372,7 +3393,7 @@ "\n", "Our inference pipeline will have three components:\n", "\n", - "1. A preprocessing transformer that will transform the input data into the format the model expects. \n", + "1. A preprocessing transformer that will transform the input data into the format the model expects.\n", "2. The TensorFlow model we trained.\n", "3. A postprocessing transformer that will transform the output of the model into a human-readable format.\n", "\n", @@ -3392,10 +3413,10 @@ "\n", "```{json}\n", "{\n", - " \"prediction\": \"Adelie\", \n", + " \"prediction\": \"Adelie\",\n", " \"confidence\": 0.802672\n", "}\n", - "```" + "```\n" ] }, { @@ -3405,12 +3426,12 @@ "source": [ "### Step 2 - Creating the Preprocessing Script\n", "\n", - "The first component of our inference pipeline will transform the input data into the format the model expects. We'll use the Scikit-Learn transformer we saved when we split and transformed the data. To deploy this component as part of an inference pipeline, we need to write a script that loads the transformer, uses it to modify the input data, and returns the output in the format the TensorFlow model expects." + "The first component of our inference pipeline will transform the input data into the format the model expects. We'll use the Scikit-Learn transformer we saved when we split and transformed the data. To deploy this component as part of an inference pipeline, we need to write a script that loads the transformer, uses it to modify the input data, and returns the output in the format the TensorFlow model expects.\n" ] }, { "cell_type": "code", - "execution_count": 62, + "execution_count": 138, "id": "e2d61d5c", "metadata": { "tags": [] @@ -3538,12 +3559,12 @@ "id": "037982c1", "metadata": {}, "source": [ - "Let's test the script to ensure everything is working as expected:" + "Let's test the script to ensure everything is working as expected:\n" ] }, { "cell_type": "code", - "execution_count": 63, + "execution_count": 139, "id": "33893ef2", "metadata": { "tags": [] @@ -3554,7 +3575,7 @@ "output_type": "stream", "text": [ "\u001b[32m.\u001b[0m\u001b[32m.\u001b[0m\u001b[32m.\u001b[0m\u001b[32m.\u001b[0m\u001b[32m.\u001b[0m\u001b[32m.\u001b[0m\u001b[32m.\u001b[0m\u001b[32m.\u001b[0m\u001b[32m.\u001b[0m\u001b[32m.\u001b[0m\u001b[32m [100%]\u001b[0m\n", - "\u001b[32m\u001b[32m\u001b[1m10 passed\u001b[0m\u001b[32m in 0.07s\u001b[0m\u001b[0m\n" + "\u001b[32m\u001b[32m\u001b[1m10 passed\u001b[0m\u001b[32m in 0.06s\u001b[0m\u001b[0m\n" ] } ], @@ -3702,12 +3723,12 @@ "source": [ "### Step 3 - Creating the Postprocessing Script\n", "\n", - "The final component of our inference pipeline will transform the output from the model into a human-readable format. We'll use the Scikit-Learn target transformer we saved when we split and transformed the data. To deploy this component as part of an inference pipeline, we need to write a script that loads the transformer, uses it to modify the output from the model, and returns a human-readable format." + "The final component of our inference pipeline will transform the output from the model into a human-readable format. We'll use the Scikit-Learn target transformer we saved when we split and transformed the data. To deploy this component as part of an inference pipeline, we need to write a script that loads the transformer, uses it to modify the output from the model, and returns a human-readable format.\n" ] }, { "cell_type": "code", - "execution_count": 64, + "execution_count": 140, "id": "48c69002", "metadata": { "tags": [] @@ -3806,12 +3827,12 @@ "id": "86c421c7", "metadata": {}, "source": [ - "Let's test the script to ensure everything is working as expected:" + "Let's test the script to ensure everything is working as expected:\n" ] }, { "cell_type": "code", - "execution_count": 65, + "execution_count": 141, "id": "741b8402", "metadata": { "tags": [] @@ -3862,7 +3883,7 @@ "source": [ "### Step 4 - Setting up the Inference Pipeline\n", "\n", - "We can now create a [PipelineModel](https://sagemaker.readthedocs.io/en/stable/api/inference/pipeline.html#sagemaker.pipeline.PipelineModel) to define our inference pipeline." + "We can now create a [PipelineModel](https://sagemaker.readthedocs.io/en/stable/api/inference/pipeline.html#sagemaker.pipeline.PipelineModel) to define our inference pipeline.\n" ] }, { @@ -3870,12 +3891,12 @@ "id": "2baf91d8", "metadata": {}, "source": [ - "We'll use the model we generated from the first step of the pipeline as the input to the first and last components of the inference pipeline. This `model.tar.gz` file contains the two transformers we need to preprocess and postprocess the data. Let's create a variable with the URI to this file:" + "We'll use the model we generated from the first step of the pipeline as the input to the first and last components of the inference pipeline. This `model.tar.gz` file contains the two transformers we need to preprocess and postprocess the data. Let's create a variable with the URI to this file:\n" ] }, { "cell_type": "code", - "execution_count": 66, + "execution_count": 142, "id": "53ea0ccf", "metadata": {}, "outputs": [], @@ -3896,12 +3917,12 @@ "id": "1b7119a4", "metadata": {}, "source": [ - "Here is the first component of the inference pipeline. It will preprocess the data before sending it to the TensorFlow model:" + "Here is the first component of the inference pipeline. It will preprocess the data before sending it to the TensorFlow model:\n" ] }, { "cell_type": "code", - "execution_count": 67, + "execution_count": 143, "id": "11a0effd", "metadata": {}, "outputs": [], @@ -3923,12 +3944,12 @@ "id": "26a18bfb", "metadata": {}, "source": [ - "Here is the last component of the inference pipeline. It will postprocess the output from the TensorFlow model before sending it back to the user:" + "Here is the last component of the inference pipeline. It will postprocess the output from the TensorFlow model before sending it back to the user:\n" ] }, { "cell_type": "code", - "execution_count": 68, + "execution_count": 144, "id": "5d7a5926", "metadata": {}, "outputs": [], @@ -3948,12 +3969,12 @@ "id": "2918f505", "metadata": {}, "source": [ - "We can now create the inference pipeline using the three models:" + "We can now create the inference pipeline using the three models:\n" ] }, { "cell_type": "code", - "execution_count": 69, + "execution_count": 145, "id": "157b8858", "metadata": { "tags": [] @@ -3977,12 +3998,12 @@ "source": [ "### Step 5 - Registering the Model\n", "\n", - "We'll modify the pipeline to register the Pipeline Model in the Model Registry. We'll use a different group name to keep Pipeline Models separate." + "We'll modify the pipeline to register the Pipeline Model in the Model Registry. We'll use a different group name to keep Pipeline Models separate.\n" ] }, { "cell_type": "code", - "execution_count": 70, + "execution_count": 146, "id": "aefe580a", "metadata": {}, "outputs": [], @@ -3995,12 +4016,12 @@ "id": "77b2b06e", "metadata": {}, "source": [ - "Let's now register the model. Notice that we will register the model with \"PendingManualApproval\" status. This means that we'll need to manually approve the model before it can be deployed to an endpoint. Check [Register a Model Version](https://docs.aws.amazon.com/sagemaker/latest/dg/model-registry-version.html) for more information about model registration." + "Let's now register the model. Notice that we will register the model with \"PendingManualApproval\" status. This means that we'll need to manually approve the model before it can be deployed to an endpoint. Check [Register a Model Version](https://docs.aws.amazon.com/sagemaker/latest/dg/model-registry-version.html) for more information about model registration.\n" ] }, { "cell_type": "code", - "execution_count": 71, + "execution_count": 147, "id": "f84d2cd5", "metadata": { "tags": [] @@ -4010,37 +4031,31 @@ "name": "stderr", "output_type": "stream", "text": [ - "/Users/svpino/dev/ml.school/.venv/lib/python3.9/site-packages/sagemaker/workflow/pipeline_context.py:297: UserWarning:\n", - "\n", - "Running within a PipelineSession, there will be No Wait, No Logs, and No Job being started.\n", - "\n" + "/Users/svpino/dev/ml.school/.venv/lib/python3.9/site-packages/sagemaker/workflow/pipeline_context.py:297: UserWarning: Running within a PipelineSession, there will be No Wait, No Logs, and No Job being started.\n", + " warnings.warn(\n" ] } ], "source": [ - "#| code: true\n", - "#| output: false\n", + "# | code: true\n", + "# | output: false\n", "\n", "register_model_step = ModelStep(\n", " name=\"register\",\n", " display_name=\"register-model\",\n", " step_args=pipeline_model.register(\n", " model_package_group_name=PIPELINE_MODEL_PACKAGE_GROUP,\n", - "\n", " model_metrics=model_metrics,\n", " approval_status=\"PendingManualApproval\",\n", - "\n", " # Our inference pipeline model supports two content\n", " # types: text/csv and application/json.\n", " content_types=[\"text/csv\", \"application/json\"],\n", " response_types=[\"text/csv\", \"application/json\"],\n", - " \n", " # This is the suggested inference instance types when\n", " # deploying the model or using it as part of a batch\n", " # transform job.\n", " inference_instances=[\"ml.m5.xlarge\"],\n", " transform_instances=[\"ml.m5.xlarge\"],\n", - " \n", " domain=\"MACHINE_LEARNING\",\n", " task=\"CLASSIFICATION\",\n", " framework=\"TENSORFLOW\",\n", @@ -4056,12 +4071,12 @@ "source": [ "### Step 6 - Modifying the Condition Step\n", "\n", - "Since we modify the registration step, we also need to modify the Condition Step to use the new registration:" + "Since we modified the registration step, we also need to modify the Condition Step to use the new registration:\n" ] }, { "cell_type": "code", - "execution_count": 72, + "execution_count": 148, "id": "b9712905-9fe3-4148-ae6d-05b0a48e742e", "metadata": { "tags": [] @@ -4083,12 +4098,12 @@ "source": [ "### Step 7 - Creating the Pipeline\n", "\n", - "We can now define the SageMaker Pipeline and submit its definition to the SageMaker Pipelines service to create the pipeline if it doesn't exist or update it if it does." + "We can now define the SageMaker Pipeline and submit its definition to the SageMaker Pipelines service to create the pipeline if it doesn't exist or update it if it does.\n" ] }, { "cell_type": "code", - "execution_count": 73, + "execution_count": 149, "id": "bad9f51d", "metadata": { "tags": [] @@ -4112,9 +4127,7 @@ "name": "stderr", "output_type": "stream", "text": [ - "INFO:sagemaker.processing:Uploaded None to s3://mlschool/session4-pipeline/code/023576aa7e5c5a7eb833b29794f54112/sourcedir.tar.gz\n", - "INFO:sagemaker.processing:runproc.sh uploaded to s3://mlschool/session4-pipeline/code/2c207c809cb0e0e9a1d77e5247f961f9/runproc.sh\n", - "WARNING:sagemaker.workflow._utils:Popping out 'CertifyForMarketplace' from the pipeline definition since it will be overridden in pipeline execution time.\n" + "INFO:sagemaker.processing:Uploaded None to s3://mlschool/session4-pipeline/code/09fea667a5ab7c37a068f22c00762d0b/sourcedir.tar.gz\n" ] }, { @@ -4128,6 +4141,8 @@ "name": "stderr", "output_type": "stream", "text": [ + "INFO:sagemaker.processing:runproc.sh uploaded to s3://mlschool/session4-pipeline/code/2c207c809cb0e0e9a1d77e5247f961f9/runproc.sh\n", + "WARNING:sagemaker.workflow._utils:Popping out 'CertifyForMarketplace' from the pipeline definition since it will be overridden in pipeline execution time.\n", "INFO:sagemaker.image_uris:image_uri is not presented, retrieving image_uri based on instance_type, framework etc.\n" ] }, @@ -4143,7 +4158,7 @@ "name": "stderr", "output_type": "stream", "text": [ - "INFO:sagemaker.processing:Uploaded None to s3://mlschool/session4-pipeline/code/023576aa7e5c5a7eb833b29794f54112/sourcedir.tar.gz\n", + "INFO:sagemaker.processing:Uploaded None to s3://mlschool/session4-pipeline/code/09fea667a5ab7c37a068f22c00762d0b/sourcedir.tar.gz\n", "INFO:sagemaker.processing:runproc.sh uploaded to s3://mlschool/session4-pipeline/code/2c207c809cb0e0e9a1d77e5247f961f9/runproc.sh\n" ] }, @@ -4151,23 +4166,23 @@ "data": { "text/plain": [ "{'PipelineArn': 'arn:aws:sagemaker:us-east-1:325223348818:pipeline/session4-pipeline',\n", - " 'ResponseMetadata': {'RequestId': '7aa113a6-6eae-4999-b275-0b5d9a18a828',\n", + " 'ResponseMetadata': {'RequestId': '14c33b8d-0697-4b6c-a839-aa3c73acb35f',\n", " 'HTTPStatusCode': 200,\n", - " 'HTTPHeaders': {'x-amzn-requestid': '7aa113a6-6eae-4999-b275-0b5d9a18a828',\n", + " 'HTTPHeaders': {'x-amzn-requestid': '14c33b8d-0697-4b6c-a839-aa3c73acb35f',\n", " 'content-type': 'application/x-amz-json-1.1',\n", " 'content-length': '85',\n", - " 'date': 'Sat, 21 Oct 2023 16:02:00 GMT'},\n", + " 'date': 'Mon, 23 Oct 2023 15:47:23 GMT'},\n", " 'RetryAttempts': 0}}" ] }, - "execution_count": 73, + "execution_count": 149, "metadata": {}, "output_type": "execute_result" } ], "source": [ - "#| code: true\n", - "#| output: false\n", + "# | code: true\n", + "# | output: false\n", "\n", "session4_pipeline = Pipeline(\n", " name=\"session4-pipeline\",\n", @@ -4190,7 +4205,7 @@ "id": "20c71f91", "metadata": {}, "source": [ - "We can now start the pipeline:" + "We can now start the pipeline:\n" ] }, { @@ -4202,26 +4217,15 @@ "\n", "
Note: \n", " The %%script cell magic is a convenient way to prevent the notebook from executing a specific cell. If you want to run the cell, comment out the line containing the %%script cell magic.\n", - "
" + "\n" ] }, { "cell_type": "code", - "execution_count": 74, + "execution_count": 150, "id": "20dfbd97", "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "_PipelineExecution(arn='arn:aws:sagemaker:us-east-1:325223348818:pipeline/session4-pipeline/execution/zftbfrjomwss', sagemaker_session=)" - ] - }, - "execution_count": 74, - "metadata": {}, - "output_type": "execute_result" - } - ], + "outputs": [], "source": [ "%%script false --no-raise-error\n", "\n", @@ -4242,13 +4246,14 @@ "We will use [Amazon EventBridge](https://aws.amazon.com/pm/eventbridge/) to trigger a Lambda function that will deploy the model whenever its status changes from \"PendingManualApproval\" to \"Approved.\" Let's start by writing the Lambda function to take the model information and create a new endpoint.\n", "\n", "We'll enable [Data Capture](https://docs.aws.amazon.com/sagemaker/latest/dg/model-monitor-data-capture.html) as part of the endpoint configuration. With Data Capture we can record the inputs and outputs of the endpoint to use them later for monitoring the model:\n", - "* `InitialSamplingPercentage` represents the percentage of traffic that we want to capture. \n", - "* `DestinationS3Uri` specifies the S3 location where we want to store the captured data.\n" + "\n", + "- `InitialSamplingPercentage` represents the percentage of traffic that we want to capture.\n", + "- `DestinationS3Uri` specifies the S3 location where we want to store the captured data.\n" ] }, { "cell_type": "code", - "execution_count": 476, + "execution_count": 151, "id": "998314a3", "metadata": {}, "outputs": [ @@ -4370,12 +4375,12 @@ "id": "b3374868", "metadata": {}, "source": [ - "Let's create a constant pointing to the location where we'll store the data that the endpoint will capture:" + "Let's create a constant pointing to the location where we'll store the data that the endpoint will capture:\n" ] }, { "cell_type": "code", - "execution_count": 568, + "execution_count": 152, "id": "c51f421f", "metadata": {}, "outputs": [], @@ -4388,12 +4393,12 @@ "id": "5b582ace", "metadata": {}, "source": [ - "We need to ensure our Lambda function has permission to interact with SageMaker, so let's create a new role and then create the lambda function." + "We need to ensure our Lambda function has permission to interact with SageMaker, so let's create a new role and then create the lambda function.\n" ] }, { "cell_type": "code", - "execution_count": 569, + "execution_count": 153, "id": "4ad4f1f2", "metadata": { "tags": [] @@ -4413,37 +4418,36 @@ "\n", "try:\n", " response = iam_client.create_role(\n", - " RoleName = lambda_role_name,\n", - " AssumeRolePolicyDocument = json.dumps({\n", - " \"Version\": \"2012-10-17\",\n", - " \"Statement\": [\n", - " {\n", - " \"Effect\": \"Allow\",\n", - " \"Principal\": {\n", - " \"Service\": [\n", - " \"lambda.amazonaws.com\",\n", - " \"events.amazonaws.com\"\n", - " ]\n", - " },\n", - " \"Action\": \"sts:AssumeRole\",\n", - " }\n", - " ]\n", - " }),\n", - " Description=\"Lambda Endpoint Deployment\"\n", + " RoleName=lambda_role_name,\n", + " AssumeRolePolicyDocument=json.dumps(\n", + " {\n", + " \"Version\": \"2012-10-17\",\n", + " \"Statement\": [\n", + " {\n", + " \"Effect\": \"Allow\",\n", + " \"Principal\": {\n", + " \"Service\": [\"lambda.amazonaws.com\", \"events.amazonaws.com\"]\n", + " },\n", + " \"Action\": \"sts:AssumeRole\",\n", + " }\n", + " ],\n", + " }\n", + " ),\n", + " Description=\"Lambda Endpoint Deployment\",\n", " )\n", "\n", " lambda_role_arn = response[\"Role\"][\"Arn\"]\n", - " \n", + "\n", " iam_client.attach_role_policy(\n", " RoleName=\"arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole\",\n", - " PolicyArn=lambda_role_arn\n", + " PolicyArn=lambda_role_arn,\n", " )\n", - " \n", + "\n", " iam_client.attach_role_policy(\n", " RoleName=\"arn:aws:iam::aws:policy/AmazonSageMakerFullAccess\",\n", - " PolicyArn=lambda_role_arn\n", + " PolicyArn=lambda_role_arn,\n", " )\n", - " \n", + "\n", " print(f'Role \"{lambda_role_name}\" created with ARN \"{lambda_role_arn}\".')\n", "except iam_client.exceptions.EntityAlreadyExistsException:\n", " print(f\"Role {lambda_role_name} already exists.\")\n", @@ -4456,12 +4460,12 @@ "id": "acef9d48", "metadata": {}, "source": [ - "We can now create the Lambda function:" + "We can now create the Lambda function:\n" ] }, { "cell_type": "code", - "execution_count": 570, + "execution_count": 154, "id": "ad8c8019", "metadata": { "tags": [] @@ -4470,13 +4474,13 @@ { "data": { "text/plain": [ - "{'ResponseMetadata': {'RequestId': '9c838deb-9ac7-4b82-85a1-6b7040bbe3fe',\n", + "{'ResponseMetadata': {'RequestId': '21713bbd-59e2-478d-995f-129a5503b310',\n", " 'HTTPStatusCode': 200,\n", - " 'HTTPHeaders': {'date': 'Sat, 21 Oct 2023 14:42:48 GMT',\n", + " 'HTTPHeaders': {'date': 'Mon, 23 Oct 2023 15:47:24 GMT',\n", " 'content-type': 'application/json',\n", " 'content-length': '1428',\n", " 'connection': 'keep-alive',\n", - " 'x-amzn-requestid': '9c838deb-9ac7-4b82-85a1-6b7040bbe3fe'},\n", + " 'x-amzn-requestid': '21713bbd-59e2-478d-995f-129a5503b310'},\n", " 'RetryAttempts': 0},\n", " 'FunctionName': 'deploy_fn',\n", " 'FunctionArn': 'arn:aws:lambda:us-east-1:325223348818:function:deploy_fn',\n", @@ -4487,14 +4491,14 @@ " 'Description': '',\n", " 'Timeout': 600,\n", " 'MemorySize': 128,\n", - " 'LastModified': '2023-10-21T14:42:48.000+0000',\n", - " 'CodeSha256': 'v++6KUKkyiwMAdPTf80/R0U2m2LlqDj5T3oISxTE8Ug=',\n", + " 'LastModified': '2023-10-23T15:47:24.000+0000',\n", + " 'CodeSha256': 'ueBRU1NVemrHDSO1q2iAskrlzV83Ha77uojYSsRFZVQ=',\n", " 'Version': '$LATEST',\n", " 'Environment': {'Variables': {'ROLE': 'arn:aws:iam::325223348818:role/service-role/AmazonSageMaker-ExecutionRole-20230312T160501',\n", " 'DATA_CAPTURE_DESTINATION': 's3://mlschool/penguins/monitoring/data-capture',\n", " 'ENDPOINT': 'penguins-endpoint'}},\n", " 'TracingConfig': {'Mode': 'PassThrough'},\n", - " 'RevisionId': 'df1b3fc9-c1d1-4049-ab00-e7bcac53de2e',\n", + " 'RevisionId': '46956778-abb2-4e7f-b332-84cdd1bb9772',\n", " 'Layers': [],\n", " 'State': 'Active',\n", " 'LastUpdateStatus': 'InProgress',\n", @@ -4507,15 +4511,14 @@ " 'RuntimeVersionConfig': {'RuntimeVersionArn': 'arn:aws:lambda:us-east-1::runtime:6cf63f1a78b5c5e19617d6b4b111370fdbda415ea91bdfdc5aacef9fee76b64a'}}" ] }, - "execution_count": 570, + "execution_count": 154, "metadata": {}, "output_type": "execute_result" } ], "source": [ - "#| eval: false\n", - "#| code: true\n", - "#| output: false\n", + "# | code: true\n", + "# | output: false\n", "\n", "from sagemaker.lambda_helper import Lambda\n", "\n", @@ -4532,9 +4535,9 @@ " \"Variables\": {\n", " \"ENDPOINT\": ENDPOINT,\n", " \"DATA_CAPTURE_DESTINATION\": DATA_CAPTURE_DESTINATION,\n", - " \"ROLE\": role\n", + " \"ROLE\": role,\n", " }\n", - " }\n", + " },\n", ")\n", "\n", "lambda_response = None\n", @@ -4551,12 +4554,12 @@ "source": [ "### Step 9 - Setting Up EventBridge\n", "\n", - "We can now create an EventBridge rule that triggers the deployment process whenever a model approval status becomes \"Approved\". To do this, let's define the event pattern that will trigger the deployment process:" + "We can now create an EventBridge rule that triggers the deployment process whenever a model approval status becomes \"Approved\". To do this, let's define the event pattern that will trigger the deployment process:\n" ] }, { "cell_type": "code", - "execution_count": 571, + "execution_count": 155, "id": "27ce7cc5", "metadata": {}, "outputs": [], @@ -4578,12 +4581,12 @@ "id": "d1b23587", "metadata": {}, "source": [ - "Let's now create the EventBridge rule:" + "Let's now create the EventBridge rule:\n" ] }, { "cell_type": "code", - "execution_count": 646, + "execution_count": 156, "id": "2a878179", "metadata": {}, "outputs": [], @@ -4602,12 +4605,12 @@ "id": "0b3ba782", "metadata": {}, "source": [ - "Now, we need to define the target of the rule. The target will trigger whenever the rule matches an event. In this case, we want to trigger the Lambda function we created before:" + "Now, we need to define the target of the rule. The target will trigger whenever the rule matches an event. In this case, we want to trigger the Lambda function we created before:\n" ] }, { "cell_type": "code", - "execution_count": 648, + "execution_count": 157, "id": "dc714a97", "metadata": { "tags": [] @@ -4630,12 +4633,12 @@ "id": "400585a1", "metadata": {}, "source": [ - "Finally, we need to give the Lambda function permission to be triggered by the EventBridge rule:" + "Finally, we need to give the Lambda function permission to be triggered by the EventBridge rule:\n" ] }, { "cell_type": "code", - "execution_count": 649, + "execution_count": 158, "id": "d74be86b", "metadata": {}, "outputs": [ @@ -4668,7 +4671,7 @@ "source": [ "### Step 10 - Testing the Endpoint\n", "\n", - "Let's now test the endpoint we deployed automatically with the pipeline. We will use the function to create a predictor with a JSON encoder and decoder. " + "Let's now test the endpoint we deployed automatically with the pipeline. We will use the function to create a predictor with a JSON encoder and decoder.\n" ] }, { @@ -4680,12 +4683,12 @@ "\n", "
Note: \n", " The %%script cell magic is a convenient way to prevent the notebook from executing a specific cell. If you want to run the cell, comment out the line containing the %%script cell magic.\n", - "
" + "\n" ] }, { "cell_type": "code", - "execution_count": 79, + "execution_count": 159, "id": "3cc966fb-b611-417f-a8b8-0c5d2f95252c", "metadata": { "tags": [] @@ -4730,26 +4733,17 @@ "\n", "
Note: \n", " The %%script cell magic is a convenient way to prevent the notebook from executing a specific cell. If you want to run the cell, comment out the line containing the %%script cell magic.\n", - "
" + "\n" ] }, { "cell_type": "code", - "execution_count": 78, + "execution_count": 160, "id": "8c3e851a-2416-4a0b-b8a1-c483cde3d776", "metadata": { "tags": [] }, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "INFO:sagemaker:Deleting endpoint configuration with name: penguins-endpoint-config-1021161139\n", - "INFO:sagemaker:Deleting endpoint with name: penguins-endpoint\n" - ] - } - ], + "outputs": [], "source": [ "%%script false --no-raise-error\n", "#| eval: false\n", @@ -4764,17 +4758,15 @@ "source": [ "### Assignments\n", "\n", - "* Assignment 4.1 Every Endpoint has an invocation URL you can use to generate predictions with the model from outside AWS. As part of this assignment, write a simple Python script that will run on your local computer and run a few samples through the Endpoint. You will need your AWS access key and secret to connect to the Endpoint.\n", - "\n", + "- Assignment 4.1 Every Endpoint has an invocation URL you can use to generate predictions with the model from outside AWS. As part of this assignment, write a simple Python script that will run on your local computer and run a few samples through the Endpoint. You will need your AWS access key and secret to connect to the Endpoint.\n", "\n", - "* Assignment 4.2 We can use model variants to perform A/B testing between a new model and an old model. Create a function that given the ARN of two models in the Model Registry deploys them to an Endpoint as separate variants. Each variant should receive 50% of the traffic. Write another function that invokes the endpoint by default, but allows the caller to invoke a specific variant if they want to.\n", + "- Assignment 4.2 We can use model variants to perform A/B testing between a new model and an old model. Create a function that given the ARN of two models in the Model Registry deploys them to an Endpoint as separate variants. Each variant should receive 50% of the traffic. Write another function that invokes the endpoint by default, but allows the caller to invoke a specific variant if they want to.\n", "\n", + "- Assignment 4.3 We can use SageMaker Model Shadow Deployments to create shadow variants to validate a new model version before promoting it to production. Write a function that given the ARN of a model in the Model Registry, updates an Endpoint and deploys the model as a shadow variant. Check [Shadow variants](https://docs.aws.amazon.com/sagemaker/latest/dg/model-shadow-deployment.html) for more information about this topic. Send some traffic to the Endpoint and compare the results from the main model with its shadow variant.\n", "\n", - "* Assignment 4.3 We can use SageMaker Model Shadow Deployments to create shadow variants to validate a new model version before promoting it to production. Write a function that given the ARN of a model in the Model Registry, updates an Endpoint and deploys the model as a shadow variant. Check [Shadow variants](https://docs.aws.amazon.com/sagemaker/latest/dg/model-shadow-deployment.html) for more information about this topic. Send some traffic to the Endpoint and compare the results from the main model with its shadow variant.\n", + "- Assignment 4.4 SageMaker supports auto scaling your models. Auto scaling dynamically adjusts the number of instances provisioned for a model in response to changes in the workload. For this assignment, define a target-tracking scaling policy for a variant of your Endpoint and use the `SageMakerVariantInvocationsPerInstance` metric. `SageMakerVariantInvocationsPerInstance` is the average number of times per minute that the variant is invoked. Check [Automatically Scale Amazon SageMaker Models](https://docs.aws.amazon.com/sagemaker/latest/dg/endpoint-auto-scaling.html) for more information about auto scaling models.\n", "\n", - "* Assignment 4.4 SageMaker supports auto scaling your models. Auto scaling dynamically adjusts the number of instances provisioned for a model in response to changes in the workload. For this assignment, define a target-tracking scaling policy for a variant of your Endpoint and use the `SageMakerVariantInvocationsPerInstance` metric. `SageMakerVariantInvocationsPerInstance` is the average number of times per minute that the variant is invoked. Check [Automatically Scale Amazon SageMaker Models](https://docs.aws.amazon.com/sagemaker/latest/dg/endpoint-auto-scaling.html) for more information about auto scaling models.\n", - "\n", - "* Assignment 4.5 Modify the SageMaker Pipeline you created for the \"Pipeline of Digits\" project and add a Lambda Step to deploy the model automatically. Create a custom inference script so the endpoint receives a JSON containing the URL of an image, and returns a single value representing the predicted digit." + "- Assignment 4.5 Modify the SageMaker Pipeline you created for the \"Pipeline of Digits\" project and add a Lambda Step to deploy the model automatically. Create a custom inference script so the endpoint receives a JSON containing the URL of an image, and returns a single value representing the predicted digit.\n" ] }, { @@ -4782,9 +4774,7 @@ "id": "e544ae36-00b3-4bde-b133-c3a59bb7f1d8", "metadata": {}, "source": [ - "#| hide\n", - "\n", - "## Session 5 - Monitoring\n", + "## Session 5 - Data Distribution Shifts And Model Monitoring\n", "\n", "In this session we'll set up a monitoring process to analyze the quality of the data our endpoint receives and the endpoint predictions. For this, we need to check the data received by the endpoint, generate ground truth labels, and compare them with a baseline performance.\n", "\n", @@ -4793,46 +4783,24 @@ "1. Create baselines we can use to compare against real-time traffic.\n", "2. Set up a schedule to continuously evaluate and compare against the baselines.\n", "\n", - "Notice that we use the baseline datasets we generated during the Processing Step. These baseline datasets are the same unprocessed data in JSON format. We do this because we need raw data to compare against the endpoint input.\n", - "\n", - "Check [Amazon SageMaker Model Monitor](https://sagemaker.readthedocs.io/en/stable/amazon_sagemaker_model_monitoring.html) for a brief explanation of how to use SageMaker's Model Monitoring functionality. [Monitor models for data and model quality, bias, and explainability](https://docs.aws.amazon.com/sagemaker/latest/dg/model-monitor.html) is a much more extensive guide to monitoring in Amazon SageMaker." + "Check [Amazon SageMaker Model Monitor](https://sagemaker.readthedocs.io/en/stable/amazon_sagemaker_model_monitoring.html) for a brief explanation of how to use SageMaker's Model Monitoring functionality. [Monitor models for data and model quality, bias, and explainability](https://docs.aws.amazon.com/sagemaker/latest/dg/model-monitor.html) is a much more extensive guide to monitoring in Amazon SageMaker.\n" + ] + }, + { + "cell_type": "markdown", + "id": "0ef0ad20", + "metadata": {}, + "source": [ + "Let's start by defining three variables we'll use throughout the session:\n" ] }, { "cell_type": "code", - "execution_count": 462, - "id": "2c7e2f9d-cc75-46bc-8700-f7123292fac5", - "metadata": { - "tags": [] - }, + "execution_count": 174, + "id": "2bb846d0", + "metadata": {}, "outputs": [], "source": [ - "#| hide\n", - "#| eval: false\n", - "\n", - "import random\n", - "\n", - "from datetime import datetime\n", - "from IPython.display import JSON\n", - "\n", - "from time import sleep\n", - "from threading import Thread, Event\n", - "from sagemaker.workflow.check_job_config import CheckJobConfig\n", - "from sagemaker.workflow.quality_check_step import DataQualityCheckConfig, QualityCheckStep, ModelQualityCheckConfig\n", - "from sagemaker.workflow.execution_variables import ExecutionVariables\n", - "from sagemaker.drift_check_baselines import DriftCheckBaselines\n", - "from sagemaker.workflow.parameters import ParameterBoolean\n", - "from sagemaker.model import Model\n", - "from sagemaker.model_monitor.dataset_format import DatasetFormat\n", - "from sagemaker.s3 import S3Uploader\n", - "from sagemaker.inputs import CreateModelInput, TransformInput\n", - "from sagemaker.transformer import Transformer\n", - "from sagemaker.workflow.steps import CreateModelStep, TransformStep\n", - "from sagemaker.model_monitor import (\n", - " CronExpressionGenerator, DefaultModelMonitor, MonitoringExecution,\n", - " ModelQualityMonitor, EndpointInput\n", - ")\n", - "\n", "GROUND_TRUTH_LOCATION = f\"{S3_LOCATION}/monitoring/groundtruth\"\n", "DATA_QUALITY_LOCATION = f\"{S3_LOCATION}/monitoring/data-quality\"\n", "MODEL_QUALITY_LOCATION = f\"{S3_LOCATION}/monitoring/model-quality\"" @@ -4840,245 +4808,171 @@ }, { "cell_type": "markdown", - "id": "02a1e7af-933e-492d-948e-aa16cc67c3db", - "metadata": { - "tags": [] - }, + "id": "24c26ac4-5d30-41e9-8952-e4deb39de819", + "metadata": {}, "source": [ - "#| hide\n", - "### Step 1 - Checking Captured Data\n", + "### Step 1 - Generating the Data Baseline\n", "\n", - "Let's check the S3 location where the endpoint stores the requests and responses that it receives.\n", + "Let's start by configuring a [Quality Check Step](https://docs.aws.amazon.com/sagemaker/latest/dg/build-and-manage-steps.html#step-type-quality-check) to compute the general statistics of the data we used to build our model.\n", "\n", - "Notice that it make take a few minutes for the first few files to show up in S3. Keep running the following line until you get some." + "We can configure the instance that will run the quality check using the [CheckJobConfig](https://sagemaker.readthedocs.io/en/v2.73.0/workflows/pipelines/sagemaker.workflow.pipelines.html#sagemaker.workflow.check_job_config.CheckJobConfig) class, and we can use the `DataQualityCheckConfig` class to configure the job.\n" ] }, { "cell_type": "code", - "execution_count": 40, - "id": "3f35e8db-24d7-4d4b-9264-78ee5070cf27", + "execution_count": 184, + "id": "0b80bcab-d2c5-437c-a1c8-8eea208c0e29", "metadata": { "tags": [] }, "outputs": [ { - "ename": "NameError", - "evalue": "name 'DATA_CAPTURE_DESTINATION' is not defined", - "output_type": "error", - "traceback": [ - "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", - "\u001b[0;31mNameError\u001b[0m Traceback (most recent call last)", - "\u001b[1;32m/Users/svpino/dev/ml.school/program/index.ipynb Cell 131\u001b[0m line \u001b[0;36m3\n\u001b[1;32m 1\u001b[0m \u001b[39mfrom\u001b[39;00m \u001b[39msagemaker\u001b[39;00m\u001b[39m.\u001b[39;00m\u001b[39ms3\u001b[39;00m \u001b[39mimport\u001b[39;00m S3Downloader\n\u001b[0;32m----> 3\u001b[0m files \u001b[39m=\u001b[39m S3Downloader\u001b[39m.\u001b[39mlist(DATA_CAPTURE_DESTINATION)[:\u001b[39m3\u001b[39m]\n\u001b[1;32m 4\u001b[0m files\n", - "\u001b[0;31mNameError\u001b[0m: name 'DATA_CAPTURE_DESTINATION' is not defined" + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO:sagemaker.image_uris:Defaulting to the only supported framework/algorithm version: .\n", + "INFO:sagemaker.image_uris:Ignoring unnecessary instance type: None.\n" ] } ], "source": [ - "#| hide\n", - "#| eval: false\n", + "# | code: true\n", + "# | output: false\n", "\n", - "from sagemaker.s3 import S3Downloader\n", + "from sagemaker.workflow.quality_check_step import (\n", + " QualityCheckStep,\n", + " DataQualityCheckConfig,\n", + ")\n", + "from sagemaker.workflow.check_job_config import CheckJobConfig\n", + "from sagemaker.model_monitor.dataset_format import DatasetFormat\n", "\n", - "files = S3Downloader.list(DATA_CAPTURE_DESTINATION)[:3]\n", - "files" + "data_quality_baseline_step = QualityCheckStep(\n", + " name=\"generate-data-quality-baseline\",\n", + " check_job_config=CheckJobConfig(\n", + " instance_type=\"ml.c5.xlarge\",\n", + " instance_count=1,\n", + " volume_size_in_gb=20,\n", + " sagemaker_session=pipeline_session,\n", + " role=role,\n", + " ),\n", + " quality_check_config=DataQualityCheckConfig(\n", + " baseline_dataset=f\"{S3_LOCATION}/data\",\n", + " dataset_format=DatasetFormat.csv(header=True, output_columns_position=\"END\"),\n", + " output_s3_uri=DATA_QUALITY_LOCATION,\n", + " ),\n", + " skip_check=True,\n", + " register_new_baseline=True,\n", + " cache_config=cache_config,\n", + ")" ] }, { "cell_type": "markdown", - "id": "e3bc31d3-a277-446a-afd1-8bf7aab6173e", - "metadata": {}, + "id": "81430dfd-2524-43e4-bfe9-c6545316005d", + "metadata": { + "tags": [] + }, "source": [ - "#| hide\n", + "### Step 2 - Creating Test Predictions\n", "\n", - "These files contain the data captured by the endpoint in a SageMaker-specific JSON-line format. Each inference request is captured in a single line in the `jsonl` file. The line contains both the input and output merged together.\n", + "To create a baseline to compare the model performance, we must create predictions for the test set and compare the model's metrics with the model performance on production data. We can do this by running a [Batch Transform Job](https://docs.aws.amazon.com/sagemaker/latest/dg/batch-transform.html) to predict every sample from the test set. We can use a [Transform Step](https://docs.aws.amazon.com/sagemaker/latest/dg/build-and-manage-steps.html#step-type-transform) as part of the pipeline to run this job. This Batch Transform Job will run every sample from the training dataset through the model so we can compute the baseline metrics.\n", "\n", - "Let's read the first line from the first file:" + "The Transform Step requires a model to generate predictions, so we need a Model Step that creates a model:\n" ] }, { "cell_type": "code", - "execution_count": 464, - "id": "3dee0107-c9ca-4f75-873d-d47512c56797", - "metadata": { - "tags": [] - }, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "{\n", - " \"captureData\": {\n", - " \"endpointInput\": {\n", - " \"observedContentType\": \"application/json\",\n", - " \"mode\": \"INPUT\",\n", - " \"data\": \"{\\\"island\\\": \\\"Dream\\\", \\\"culmen_length_mm\\\": 46.4, \\\"culmen_depth_mm\\\": 18.6, \\\"flipper_length_mm\\\": 190.0, \\\"body_mass_g\\\": 3450.0}\",\n", - " \"encoding\": \"JSON\"\n", - " },\n", - " \"endpointOutput\": {\n", - " \"observedContentType\": \"application/json\",\n", - " \"mode\": \"OUTPUT\",\n", - " \"data\": \"{\\\"prediction\\\": \\\"Adelie\\\", \\\"confidence\\\": 0.531686723}\",\n", - " \"encoding\": \"JSON\"\n", - " }\n", - " },\n", - " \"eventMetadata\": {\n", - " \"eventId\": \"0c745f20-c492-43b0-887f-968f2443d651\",\n", - " \"inferenceTime\": \"2023-07-31T13:02:51Z\"\n", - " },\n", - " \"eventVersion\": \"0\"\n", - "}\n" - ] - } - ], + "execution_count": 194, + "id": "8194b462", + "metadata": {}, + "outputs": [], "source": [ - "#| hide\n", - "#| eval: false\n", + "# | code: true\n", + "# | output: false\n", "\n", - "if len(files):\n", - " lines = S3Downloader.read_file(files[0])\n", - " print(json.dumps(json.loads(lines.split(\"\\n\")[0]), indent=2))" + "from sagemaker.workflow.model_step import ModelStep\n", + "\n", + "create_model_step = ModelStep(\n", + " name=\"create\",\n", + " display_name=\"create-model\",\n", + " step_args=pipeline_model.create(instance_type=\"ml.m5.xlarge\"),\n", + ")" ] }, { "cell_type": "markdown", - "id": "24c26ac4-5d30-41e9-8952-e4deb39de819", + "id": "eddb6ac7", "metadata": {}, "source": [ - "#| hide\n", - "\n", - "### Step 2 - Generating a Data Drift Baseline\n", - "\n", - "Let's now configure the [Quality Check Step](https://docs.aws.amazon.com/sagemaker/latest/dg/build-and-manage-steps.html#step-type-quality-check) and feed it the train set we generated in the preprocessing step.\n", - "\n", - "We can configure the instance that will run the quality check using the [CheckJobConfig](https://sagemaker.readthedocs.io/en/v2.73.0/workflows/pipelines/sagemaker.workflow.pipelines.html#sagemaker.workflow.check_job_config.CheckJobConfig) class, and we can use the `DataQualityCheckConfig` class to configure the job." + "Let's configure the Batch Transform Job using an instance of the [Transformer](https://sagemaker.readthedocs.io/en/stable/api/inference/transformer.html) class:\n" ] }, { "cell_type": "code", - "execution_count": 465, - "id": "0b80bcab-d2c5-437c-a1c8-8eea208c0e29", - "metadata": { - "tags": [] - }, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "INFO:sagemaker.image_uris:Defaulting to the only supported framework/algorithm version: .\n", - "INFO:sagemaker.image_uris:Ignoring unnecessary instance type: None.\n" - ] - } - ], + "execution_count": 195, + "id": "bf6aa4f0", + "metadata": {}, + "outputs": [], "source": [ - "#| hide\n", - "#| eval: false\n", - "\n", - "data_quality_baseline_step = QualityCheckStep(\n", - " name=\"generate-data-quality-baseline\",\n", - " \n", - " check_job_config = CheckJobConfig(\n", - " instance_type=\"ml.t3.xlarge\",\n", - " instance_count=1,\n", - " volume_size_in_gb=20,\n", - " sagemaker_session=pipeline_session,\n", - " role=role,\n", - " ),\n", - " \n", - " quality_check_config = DataQualityCheckConfig(\n", - " # We will use the train dataset we generated during the preprocessing \n", - " # step to generate the data quality baseline.\n", - " baseline_dataset=split_and_transform_data_step.properties.ProcessingOutputConfig.Outputs[\"train-baseline\"].S3Output.S3Uri,\n", + "from sagemaker.transformer import Transformer\n", "\n", - " dataset_format=DatasetFormat.json(lines=True),\n", - " output_s3_uri=DATA_QUALITY_LOCATION\n", - " ),\n", - " \n", - " skip_check=True,\n", - " register_new_baseline=True,\n", - " model_package_group_name=MODEL_PACKAGE_GROUP,\n", - " cache_config=cache_config\n", + "transformer = Transformer(\n", + " model_name=create_model_step.properties.ModelName,\n", + " instance_type=config[\"instance_type\"],\n", + " instance_count=1,\n", + " strategy=\"MultiRecord\",\n", + " accept=\"/csv\",\n", + " assemble_with=\"Line\",\n", + " output_path=f\"{S3_LOCATION}/transform\",\n", + " sagemaker_session=sagemaker_session,\n", ")" ] }, { "cell_type": "markdown", - "id": "81430dfd-2524-43e4-bfe9-c6545316005d", - "metadata": { - "tags": [] - }, + "id": "a7f01fb9", + "metadata": {}, "source": [ - "#| hide\n", - "\n", - "### Step 3 - Creating Test Predictions\n", + "We can now set up the [Transform Step](https://docs.aws.amazon.com/sagemaker/latest/dg/build-and-manage-steps.html#step-type-transform) using the transformer we configured before.\n", "\n", - "To create a baseline to compare the model performance, we must create predictions for the test set and compare them with the predictions from the model. We can do this by running a Batch Transform Job to predict every sample from the test dataset. We can use a [Transform Step](https://docs.aws.amazon.com/sagemaker/latest/dg/build-and-manage-steps.html#step-type-transform) as part of the pipeline to run this job. You can check [Batch Transform](https://docs.aws.amazon.com/sagemaker/latest/dg/batch-transform.html) for more information about Batch Transform Jobs.\n", + "Notice the following:\n", "\n", - "The Transform Step requires a model to generate predictions, so we need a Model Step that creates a model.\n", - "\n", - "We also need to configure the [Batch Transform Job](https://docs.aws.amazon.com/sagemaker/latest/dg/batch-transform.html) using a [Transform Step](https://docs.aws.amazon.com/sagemaker/latest/dg/build-and-manage-steps.html#step-type-transform). This Batch Transform Job will run every sample from the training dataset through the model so we can compute the baseline metrics. We can use an instance of the [Transformer](https://sagemaker.readthedocs.io/en/stable/api/inference/transformer.html) class to configure the job." + "- We'll generate predictions for the baseline output that we generated when we split and transformed the data. This baseline is the same data we used to test the model, but we saved it in its original format before transforming it.\n", + "- The output of this Batch Transform Job will have two fields. The first one will be the ground truth label, and the second one will be the prediction of the model.\n" ] }, { "cell_type": "code", - "execution_count": 466, + "execution_count": 197, "id": "1987a788-de7a-4f60-ac8d-819d9ffcdf8e", "metadata": { "tags": [] }, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "/usr/local/lib/python3.8/site-packages/sagemaker/workflow/pipeline_context.py:297: UserWarning:\n", - "\n", - "Running within a PipelineSession, there will be No Wait, No Logs, and No Job being started.\n", - "\n" - ] - } - ], + "outputs": [], "source": [ - "#| hide\n", - "#| eval: false\n", - "\n", - "create_model_step = ModelStep(\n", - " name=\"create\",\n", - " display_name=\"create-model\",\n", - " step_args=tensorflow_model.create(\n", - " instance_type=\"ml.m5.large\"\n", - " ),\n", - ")\n", - "\n", - "transformer = Transformer(\n", - " model_name=create_model_step.properties.ModelName,\n", - " base_transform_job_name=\"transform\",\n", + "# | code: true\n", + "# | output: false\n", "\n", - " instance_type=\"ml.c5.xlarge\",\n", - " instance_count=1,\n", - " \n", - " accept=\"application/json\",\n", - " strategy=\"SingleRecord\",\n", - " assemble_with=\"Line\",\n", - " \n", - " output_path=f\"{S3_LOCATION}/transform\",\n", - " sagemaker_session=pipeline_session\n", - ")\n", + "from sagemaker.workflow.steps import TransformStep\n", "\n", "generate_test_predictions_step = TransformStep(\n", " name=\"generate-test-predictions\",\n", " step_args=transformer.transform(\n", - " # We will use the test dataset we generated during the preprocessing \n", - " # step to run it through the model and generate predictions.\n", - " data=split_and_transform_data_step.properties.ProcessingOutputConfig.Outputs[\"test-baseline\"].S3Output.S3Uri,\n", - "\n", + " # We will use the baseline set we generated when we split the data.\n", + " # This set corresponds to the test split before the transformation step.\n", + " data=split_and_transform_data_step.properties.ProcessingOutputConfig.Outputs[\n", + " \"baseline\"\n", + " ].S3Output.S3Uri,\n", " join_source=\"Input\",\n", - " content_type=\"application/json\",\n", " split_type=\"Line\",\n", - " output_filter=\"$.SageMakerOutput['prediction','groundtruth']\",\n", + " content_type=\"text/csv\",\n", + " input_filter=\"$\",\n", + " # We want to output the first and the last field from the joint set.\n", + " # The first field corresponds to the groundtruth, and the last field\n", + " # corresponds to the prediction.\n", + " output_filter=\"$[0,-1]\",\n", " ),\n", - " cache_config=cache_config\n", + " cache_config=cache_config,\n", ")" ] }, @@ -5087,16 +4981,14 @@ "id": "2fafc7c4-6fef-4832-8b99-8c45d078fdd2", "metadata": {}, "source": [ - "#| hide\n", - "\n", - "### Step 4 - Generating a Model Drift Baseline\n", + "### Step 3 - Generating a Model Drift Baseline\n", "\n", - "Let's now configure the [Quality Check Step](https://docs.aws.amazon.com/sagemaker/latest/dg/build-and-manage-steps.html#step-type-quality-check) and feed it the data we generated in the Transform Step." + "Let's now configure the [Quality Check Step](https://docs.aws.amazon.com/sagemaker/latest/dg/build-and-manage-steps.html#step-type-quality-check) and feed it the data we generated in the Transform Step. This step will automatically compute the performance metrics of the model on the test set:\n" ] }, { "cell_type": "code", - "execution_count": 467, + "execution_count": 199, "id": "9aa3a284-8763-4000-a263-70314b530652", "metadata": { "tags": [] @@ -5106,48 +4998,49 @@ "name": "stderr", "output_type": "stream", "text": [ - "INFO:sagemaker.image_uris:Defaulting to the only supported framework/algorithm version: .\n", + "INFO:sagemaker.image_uris:Defaulting to the only supported framework/algorithm version: .\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ "INFO:sagemaker.image_uris:Ignoring unnecessary instance type: None.\n" ] } ], "source": [ - "#| hide\n", - "#| eval: false\n", + "# | code: true\n", + "# | output: false\n", "\n", - "model_quality_location = f\"{S3_LOCATION}/monitoring/model-quality\"\n", + "from sagemaker.workflow.quality_check_step import ModelQualityCheckConfig\n", "\n", "model_quality_baseline_step = QualityCheckStep(\n", " name=\"generate-model-quality-baseline\",\n", - " \n", - " check_job_config = CheckJobConfig(\n", - " instance_type=\"ml.t3.xlarge\",\n", + " check_job_config=CheckJobConfig(\n", + " instance_type=\"ml.c5.xlarge\",\n", " instance_count=1,\n", " volume_size_in_gb=20,\n", " sagemaker_session=pipeline_session,\n", " role=role,\n", " ),\n", - " \n", - " quality_check_config = ModelQualityCheckConfig(\n", + " quality_check_config=ModelQualityCheckConfig(\n", " # We are going to use the output of the Transform Step to generate\n", " # the model quality baseline.\n", " baseline_dataset=generate_test_predictions_step.properties.TransformOutput.S3OutputPath,\n", - "\n", - " dataset_format=DatasetFormat.json(lines=True),\n", - "\n", + " dataset_format=DatasetFormat.csv(header=False),\n", " # We need to specify the problem type and the fields where the prediction\n", " # and groundtruth are so the process knows how to interpret the results.\n", " problem_type=\"MulticlassClassification\",\n", - " inference_attribute=\"prediction\",\n", - " ground_truth_attribute=\"groundtruth\",\n", - "\n", - " output_s3_uri=model_quality_location,\n", + " # Since the data doesn't have headers, SageMaker will autocreate headers for it.\n", + " # _c0 corresponds to the first column, and _c1 corresponds to the second column.\n", + " ground_truth_attribute=\"_c0\",\n", + " inference_attribute=\"_c1\",\n", + " output_s3_uri=MODEL_QUALITY_LOCATION,\n", " ),\n", - " \n", " skip_check=True,\n", " register_new_baseline=True,\n", - " model_package_group_name=MODEL_PACKAGE_GROUP,\n", - " cache_config=cache_config\n", + " cache_config=cache_config,\n", ")" ] }, @@ -5156,23 +5049,21 @@ "id": "693535ba-fca7-4e89-a4cb-b4f333fa2d03", "metadata": {}, "source": [ - "#| hide\n", - "### Step 5 - Setting up Model Metrics\n", + "### Step 4 - Setting up Model Metrics\n", "\n", - "We can configure a new set of [ModelMetrics](https://sagemaker.readthedocs.io/en/stable/api/inference/model_monitor.html#sagemaker.model_metrics.ModelMetrics) using the results of the Data and Model Quality Steps. Check [Baseline and model version lifecycle and evolution with SageMaker Pipelines](https://docs.aws.amazon.com/sagemaker/latest/dg/pipelines-quality-clarify-baseline-lifecycle.html#pipelines-quality-clarify-baseline-evolution) for an explanation of how SageMaker uses the `DriftCheckBaselines`." + "We can configure a new set of [ModelMetrics](https://sagemaker.readthedocs.io/en/stable/api/inference/model_monitor.html#sagemaker.model_metrics.ModelMetrics) using the results of the Data and Model Quality Steps. Check [Baseline and model version lifecycle and evolution with SageMaker Pipelines](https://docs.aws.amazon.com/sagemaker/latest/dg/pipelines-quality-clarify-baseline-lifecycle.html#pipelines-quality-clarify-baseline-evolution) for an explanation of how SageMaker uses the `DriftCheckBaselines`.\n" ] }, { "cell_type": "code", - "execution_count": 468, + "execution_count": 200, "id": "a773f134-ac2f-4dba-976e-9b7f0b384b6e", "metadata": { "tags": [] }, "outputs": [], "source": [ - "#| hide\n", - "#| eval: false\n", + "from sagemaker.drift_check_baselines import DriftCheckBaselines\n", "\n", "model_metrics = ModelMetrics(\n", " model_data_statistics=MetricsSource(\n", @@ -5187,7 +5078,6 @@ " s3_uri=model_quality_baseline_step.properties.CalculatedBaselineStatistics,\n", " content_type=\"application/json\",\n", " ),\n", - " \n", " model_constraints=MetricsSource(\n", " s3_uri=model_quality_baseline_step.properties.CalculatedBaselineConstraints,\n", " content_type=\"application/json\",\n", @@ -5210,7 +5100,7 @@ " model_constraints=MetricsSource(\n", " s3_uri=model_quality_baseline_step.properties.BaselineUsedForDriftCheckConstraints,\n", " content_type=\"application/json\",\n", - " )\n", + " ),\n", ")" ] }, @@ -5219,40 +5109,39 @@ "id": "ba3487a0-05ad-4f3a-8f50-9884dc2aef64", "metadata": {}, "source": [ - "#| hide\n", - "### Step 6 - Registering the Model\n", + "### Step 5 - Modifying the Registration Step\n", "\n", - "We need to redefine the Model Step to register the [TensorFlowModel](https://sagemaker.readthedocs.io/en/stable/frameworks/tensorflow/sagemaker.tensorflow.html#tensorflow-serving-model) so it takes into account the new metrics." + "Since we want to register the model using the new metrics, we need to modify the Registration Step to use the new metrics:\n" ] }, { "cell_type": "code", - "execution_count": 469, + "execution_count": 201, "id": "7056a009-91c0-4955-90dd-b90ef8cab149", "metadata": { "tags": [] }, "outputs": [], "source": [ - "#| hide\n", - "#| eval: false\n", + "# | code: true\n", + "# | output: false\n", "\n", "register_model_step = ModelStep(\n", " name=\"register\",\n", " display_name=\"register-model\",\n", - " step_args=model.register(\n", - " model_package_group_name=MODEL_PACKAGE_GROUP,\n", + " step_args=pipeline_model.register(\n", + " model_package_group_name=PIPELINE_MODEL_PACKAGE_GROUP,\n", " model_metrics=model_metrics,\n", - " drift_check_baselines=drift_check_baselines,\n", - " approval_status=\"Approved\",\n", - " content_types=[\"application/json\"],\n", - " response_types=[\"application/json\"],\n", - " inference_instances=[\"ml.m5.large\"],\n", + " approval_status=\"PendingManualApproval\",\n", + " content_types=[\"text/csv\", \"application/json\"],\n", + " response_types=[\"text/csv\", \"application/json\"],\n", + " inference_instances=[\"ml.m5.xlarge\"],\n", + " transform_instances=[\"ml.m5.xlarge\"],\n", " domain=\"MACHINE_LEARNING\",\n", " task=\"CLASSIFICATION\",\n", " framework=\"TENSORFLOW\",\n", - " framework_version=\"2.6\",\n", - " )\n", + " framework_version=config[\"framework_version\"],\n", + " ),\n", ")" ] }, @@ -5261,36 +5150,32 @@ "id": "0d00b5e6-9858-4acc-bbfe-a2ce24ec20e0", "metadata": {}, "source": [ - "#| hide\n", - "\n", - "### Step 7 - Setting up the Condition Step\n", + "### Step 6 - Modifying the Condition Step\n", "\n", - "We only want to compute the model quality baseline if the model's performance is above the predefined threshold. The [Condition Step](https://docs.aws.amazon.com/sagemaker/latest/dg/build-and-manage-steps.html#step-type-condition) will gate all necessary steps to compute the baseline. " + "Since we modified the registration step and added a few more steps, we need to modify the Condition Step. Now, we want to generate the test predictions and compute the model quality baseline if the condition is successful:\n" ] }, { "cell_type": "code", - "execution_count": 470, + "execution_count": 203, "id": "bacaa9c6-22b0-48df-b138-95b6422fe834", "metadata": { "tags": [] }, "outputs": [], "source": [ - "#| hide\n", - "#| eval: false\n", - "\n", "condition_step = ConditionStep(\n", " name=\"check-model-accuracy\",\n", - " conditions=[condition_gte],\n", + " conditions=[condition],\n", " if_steps=[\n", - " create_model_step, \n", - " generate_test_predictions_step, \n", - " model_quality_baseline_step, \n", + " create_model_step,\n", + " generate_test_predictions_step,\n", + " model_quality_baseline_step,\n", " register_model_step,\n", - " deploy_step\n", - " ],\n", - " else_steps=[fail_step], \n", + " ]\n", + " if not LOCAL_MODE\n", + " else [],\n", + " else_steps=[fail_step],\n", ")" ] }, @@ -5299,16 +5184,14 @@ "id": "c95a7905-2550-4979-b885-f2daabb5d45e", "metadata": {}, "source": [ - "#| hide\n", - "\n", - "### Step 8 - Setting up the Pipeline\n", + "### Step 7 - Creating the Pipeline\n", "\n", - "We can now define the SageMaker Pipeline and submit its definition to the SageMaker Pipelines service to create the pipeline if it doesn't exist or update it if it does." + "We can now define the SageMaker Pipeline and submit its definition to the SageMaker Pipelines service to create the pipeline if it doesn't exist or update it if it does.\n" ] }, { "cell_type": "code", - "execution_count": 471, + "execution_count": 204, "id": "4da5e453-acd8-47a0-a39f-264d05dd93d0", "metadata": { "tags": [] @@ -5333,15 +5216,15 @@ "name": "stderr", "output_type": "stream", "text": [ - "INFO:sagemaker.processing:Uploaded None to s3://mlschool/penguins-session6/code/20c36254c1a14f23578c8c08d55a36e4/sourcedir.tar.gz\n", - "INFO:sagemaker.processing:runproc.sh uploaded to s3://mlschool/penguins-session6/code/2c207c809cb0e0e9a1d77e5247f961f9/runproc.sh\n" + "INFO:sagemaker.processing:Uploaded None to s3://mlschool/session5-pipeline/code/09fea667a5ab7c37a068f22c00762d0b/sourcedir.tar.gz\n", + "INFO:sagemaker.processing:runproc.sh uploaded to s3://mlschool/session5-pipeline/code/2c207c809cb0e0e9a1d77e5247f961f9/runproc.sh\n", + "WARNING:sagemaker.workflow._utils:Popping out 'CertifyForMarketplace' from the pipeline definition since it will be overridden in pipeline execution time.\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ - "Using provided s3_resource\n", "Using provided s3_resource\n" ] }, @@ -5349,93 +5232,142 @@ "name": "stderr", "output_type": "stream", "text": [ - "WARNING:sagemaker.workflow._utils:Popping out 'CertifyForMarketplace' from the pipeline definition since it will be overridden in pipeline execution time.\n" + "INFO:sagemaker.image_uris:image_uri is not presented, retrieving image_uri based on instance_type, framework etc.\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Using provided s3_resource\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO:sagemaker.processing:Uploaded None to s3://mlschool/session5-pipeline/code/09fea667a5ab7c37a068f22c00762d0b/sourcedir.tar.gz\n", + "INFO:sagemaker.processing:runproc.sh uploaded to s3://mlschool/session5-pipeline/code/2c207c809cb0e0e9a1d77e5247f961f9/runproc.sh\n" ] }, { "data": { "text/plain": [ - "{'PipelineArn': 'arn:aws:sagemaker:us-east-1:325223348818:pipeline/penguins-session6',\n", - " 'ResponseMetadata': {'RequestId': '4c02b133-01d0-4c83-bf4f-0170a3ee5158',\n", + "{'PipelineArn': 'arn:aws:sagemaker:us-east-1:325223348818:pipeline/session5-pipeline',\n", + " 'ResponseMetadata': {'RequestId': '571c3344-c2f1-4c39-a6f7-7bb71a799eaa',\n", " 'HTTPStatusCode': 200,\n", - " 'HTTPHeaders': {'x-amzn-requestid': '4c02b133-01d0-4c83-bf4f-0170a3ee5158',\n", + " 'HTTPHeaders': {'x-amzn-requestid': '571c3344-c2f1-4c39-a6f7-7bb71a799eaa',\n", " 'content-type': 'application/x-amz-json-1.1',\n", " 'content-length': '85',\n", - " 'date': 'Sat, 23 Sep 2023 19:08:06 GMT'},\n", + " 'date': 'Mon, 23 Oct 2023 17:19:13 GMT'},\n", " 'RetryAttempts': 0}}" ] }, - "execution_count": 471, + "execution_count": 204, "metadata": {}, "output_type": "execute_result" } ], "source": [ - "#| hide\n", - "#| eval: false\n", + "# | code: true\n", + "# | output: false\n", "\n", - "session6_pipeline = Pipeline(\n", - " name=\"penguins-session6\",\n", - " parameters=[\n", - " dataset_location, \n", - " data_capture_percentage,\n", - " data_capture_destination,\n", - " accuracy_threshold,\n", - " ],\n", + "session5_pipeline = Pipeline(\n", + " name=\"session5-pipeline\",\n", + " parameters=[dataset_location, accuracy_threshold],\n", " steps=[\n", - " preprocess_data_step, \n", - " data_quality_baseline_step,\n", + " split_and_transform_data_step,\n", " tune_model_step if USE_TUNING_STEP else train_model_step,\n", " evaluate_model_step,\n", - " condition_step\n", + " data_quality_baseline_step,\n", + " condition_step,\n", " ],\n", " pipeline_definition_config=pipeline_definition_config,\n", - " sagemaker_session=pipeline_session\n", + " sagemaker_session=config[\"session\"],\n", ")\n", "\n", - "session6_pipeline.upsert(role_arn=role)" + "session5_pipeline.upsert(role_arn=role)" ] }, { "cell_type": "markdown", - "id": "b948aa92-8064-4f03-af08-0f6a8fc329cf", + "id": "9e6b1b39", + "metadata": {}, + "source": [ + "We can now start the pipeline:\n" + ] + }, + { + "cell_type": "markdown", + "id": "9d6e5995", "metadata": {}, "source": [ "#| hide\n", "\n", - "### Step 9 - Generating Traffic and Labels\n", + "
Note: \n", + " The %%script cell magic is a convenient way to prevent the notebook from executing a specific cell. If you want to run the cell, comment out the line containing the %%script cell magic.\n", + "
\n" + ] + }, + { + "cell_type": "code", + "execution_count": 190, + "id": "10ba9909", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "_PipelineExecution(arn='arn:aws:sagemaker:us-east-1:325223348818:pipeline/session5-pipeline/execution/yh29jzycbsp2', sagemaker_session=)" + ] + }, + "execution_count": 190, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "%%script false --no-raise-error\n", "\n", - "To test the monitoring functionality, we need to generate some traffic to the endpoint and label the samples captured by the endpoint. \n", + "#| eval: false\n", + "#| code: true\n", + "#| output: false\n", "\n", - "To generate traffic, we will repeatedly send every sample from the dataset to the endpoint to simulate real prediction requests. We can simulate the labeling process by generating a random label for every sample. Check [Ingest Ground Truth Labels and Merge Them With Predictions](https://docs.aws.amazon.com/sagemaker/latest/dg/model-monitor-model-quality-merge.html) for more information about this." + "session5_pipeline.start()" ] }, { "cell_type": "markdown", - "id": "0a488020-e1d3-48a8-8cb7-a1d3c8d26a07", + "id": "b948aa92-8064-4f03-af08-0f6a8fc329cf", "metadata": {}, "source": [ "#| hide\n", "\n", - "The following function will generate the traffic to the endpoint." + "### Step 8 - Generating Fake Traffic\n", + "\n", + "To test the monitoring functionality, we need to generate traffic to the endpoint. To generate traffic, we will repeatedly send every sample from the dataset to the endpoint to simulate real prediction requests:\n" ] }, { "cell_type": "code", - "execution_count": 472, + "execution_count": 205, "id": "87a3c4ce-aff7-4f48-9d1b-be98eb746e66", "metadata": {}, "outputs": [], "source": [ - "#| hide\n", - "#| eval: false\n", + "from time import sleep\n", + "from threading import Thread, Event\n", + "\n", "\n", "def generate_traffic(predictor):\n", - " \n", " def _predict(data, predictor, stop_traffic_thread):\n", " for index, row in data.iterrows():\n", - " predictor.predict(row.to_dict(), inference_id=str(index))\n", - " \n", + " data = row.tolist()\n", + " data = \",\".join(map(str, data))\n", + " predictor.predict(\n", + " data, inference_id=str(index), initial_args={\"ContentType\": \"text/csv\"}\n", + " )\n", + "\n", " sleep(1)\n", "\n", " if stop_traffic_thread.is_set():\n", @@ -5445,50 +5377,188 @@ " while True:\n", " print(f\"Generating {data.shape[0]} predictions...\")\n", " _predict(data, predictor, stop_traffic_thread)\n", - " \n", + "\n", " if stop_traffic_thread.is_set():\n", " break\n", "\n", - " \n", " stop_traffic_thread = Event()\n", - " \n", - " data = pd.read_csv(DATA_FILEPATH).dropna()\n", - " data.drop([\"sex\"], axis=1, inplace=True)\n", - " \n", + "\n", + " data = pd.read_csv(DATA_FILEPATH, header=0).dropna()\n", + " data.drop([\"species\"], axis=1, inplace=True)\n", + "\n", " traffic_thread = Thread(\n", " target=_generate_prediction_data,\n", - " args=(data, predictor, stop_traffic_thread,)\n", + " args=(\n", + " data,\n", + " predictor,\n", + " stop_traffic_thread,\n", + " ),\n", " )\n", - " \n", + "\n", " traffic_thread.start()\n", - " \n", + "\n", " return stop_traffic_thread, traffic_thread" ] }, { "cell_type": "markdown", - "id": "5754a314-3bc0-4b41-8767-e9f06d96d250", + "id": "10652fb1", "metadata": {}, "source": [ - "#| hide\n", + "Let's now start generating traffic to the endpoint:\n" + ] + }, + { + "cell_type": "markdown", + "id": "f2fd7307", + "metadata": {}, + "source": [ + "#| hide\n", "\n", - "The following function will generate random labels." + "
Note: \n", + " The %%script cell magic is a convenient way to prevent the notebook from executing a specific cell. If you want to run the cell, comment out the line containing the %%script cell magic.\n", + "
\n" ] }, { "cell_type": "code", - "execution_count": 473, + "execution_count": 207, + "id": "3d68cbd8", + "metadata": {}, + "outputs": [], + "source": [ + "%%script false --no-raise-error\n", + "#| eval: false\n", + "#| code: true\n", + "#| output: false\n", + "\n", + "predictor = Predictor(\n", + " endpoint_name=ENDPOINT, \n", + " serializer=CSVSerializer(),\n", + " sagemaker_session=sagemaker_session\n", + ")\n", + "\n", + "stop_traffic_thread, traffic_thread = generate_traffic(predictor)" + ] + }, + { + "cell_type": "markdown", + "id": "5754a314-3bc0-4b41-8767-e9f06d96d250", + "metadata": {}, + "source": [ + "### Step 9 - Generating Fake Labels\n", + "\n", + "To test the performance of the model, we need to label the samples captured by the endpoint.\n" + ] + }, + { + "cell_type": "markdown", + "id": "02a1e7af-933e-492d-948e-aa16cc67c3db", + "metadata": { + "tags": [] + }, + "source": [ + "Let's start by checking the location where the endpoint stores the captured data. It make take a few minutes for the first few files to show up in S3:\n" + ] + }, + { + "cell_type": "code", + "execution_count": 210, + "id": "3f35e8db-24d7-4d4b-9264-78ee5070cf27", + "metadata": { + "tags": [] + }, + "outputs": [ + { + "data": { + "text/plain": [ + "['s3://mlschool/penguins/monitoring/data-capture/penguins-endpoint/AllTraffic/2023/09/25/13/15-40-735-9cc3750d-ba42-472c-903d-969695d2096d.jsonl',\n", + " 's3://mlschool/penguins/monitoring/data-capture/penguins-endpoint/AllTraffic/2023/09/27/15/45-31-289-001fc69f-c352-4da2-b57a-a3a69fe3fecf.jsonl',\n", + " 's3://mlschool/penguins/monitoring/data-capture/penguins-endpoint/AllTraffic/2023/10/05/16/50-04-992-e16242d1-925c-4b07-9289-dffa0e026679.jsonl']" + ] + }, + "execution_count": 210, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "from sagemaker.s3 import S3Downloader\n", + "\n", + "files = S3Downloader.list(DATA_CAPTURE_DESTINATION)[:3]\n", + "files" + ] + }, + { + "cell_type": "markdown", + "id": "74c28f66", + "metadata": {}, + "source": [ + "These files contain the data captured by the endpoint in a SageMaker-specific JSON-line format. Each inference request is captured in a single line in the `jsonl` file. The line contains both the input and output merged together:\n" + ] + }, + { + "cell_type": "code", + "execution_count": 211, + "id": "6305949f", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "{\n", + " \"captureData\": {\n", + " \"endpointInput\": {\n", + " \"observedContentType\": \"text/csv\",\n", + " \"mode\": \"INPUT\",\n", + " \"data\": \"Torgersen,39.1,18.7,181.0,3750.0,MALE\\nTorgersen,39.5,17.4,186.0,3800.0,FEMALE\\nTorgersen,40.3,18.0,195.0,3250.0,FEMALE\\n\",\n", + " \"encoding\": \"CSV\"\n", + " },\n", + " \"endpointOutput\": {\n", + " \"observedContentType\": \"application/json\",\n", + " \"mode\": \"OUTPUT\",\n", + " \"data\": \"[{\\\"prediction\\\": \\\"Adelie\\\", \\\"confidence\\\": 0.775418103}, {\\\"prediction\\\": \\\"Adelie\\\", \\\"confidence\\\": 0.775709867}, {\\\"prediction\\\": \\\"Adelie\\\", \\\"confidence\\\": 0.67967391}]\",\n", + " \"encoding\": \"JSON\"\n", + " }\n", + " },\n", + " \"eventMetadata\": {\n", + " \"eventId\": \"d33f9a23-5ae3-4403-9aa1-3759d7fa8015\",\n", + " \"inferenceTime\": \"2023-09-25T13:15:40Z\"\n", + " },\n", + " \"eventVersion\": \"0\"\n", + "}\n" + ] + } + ], + "source": [ + "if len(files):\n", + " lines = S3Downloader.read_file(files[0])\n", + " print(json.dumps(json.loads(lines.split(\"\\n\")[0]), indent=2))" + ] + }, + { + "cell_type": "markdown", + "id": "c1d4904f", + "metadata": {}, + "source": [ + "Let's now define the function that will generate random labels. We can simulate the labeling process by generating a random label for every sample. Check [Ingest Ground Truth Labels and Merge Them With Predictions](https://docs.aws.amazon.com/sagemaker/latest/dg/model-monitor-model-quality-merge.html) for more information about this.\n" + ] + }, + { + "cell_type": "code", + "execution_count": 212, "id": "cb649a6e-fabe-4103-b1db-7c6a01fe959a", "metadata": { "tags": [] }, "outputs": [], "source": [ - "#| hide\n", - "#| eval: false\n", + "import random\n", + "from datetime import datetime\n", "\n", - "def generate_ground_truth_data(predictor, ground_truth_location):\n", - " \n", + "\n", + "def generate_ground_truth_data(ground_truth_location):\n", " def _generate_ground_truth_record(inference_id):\n", " random.seed(inference_id)\n", "\n", @@ -5503,7 +5573,6 @@ " \"eventVersion\": \"0\",\n", " }\n", "\n", - "\n", " def _upload_ground_truth(records, upload_time):\n", " records = [json.dumps(r) for r in records]\n", " data = \"\\n\".join(records)\n", @@ -5511,9 +5580,8 @@ "\n", " print(f\"Uploading ground truth data to {uri}...\")\n", "\n", - " S3Uploader.upload_string_as_file_body(data, uri) \n", + " S3Uploader.upload_string_as_file_body(data, uri)\n", "\n", - " \n", " def _generate_ground_truth_data(max_records, stop_ground_truth_thread):\n", " while True:\n", " records = [_generate_ground_truth_record(i) for i in range(max_records)]\n", @@ -5524,60 +5592,55 @@ "\n", " sleep(30)\n", "\n", - " \n", " stop_ground_truth_thread = Event()\n", " data = pd.read_csv(DATA_FILEPATH).dropna()\n", - " \n", + "\n", " groundtruth_thread = Thread(\n", " target=_generate_ground_truth_data,\n", - " args=(len(data), stop_ground_truth_thread,)\n", + " args=(\n", + " len(data),\n", + " stop_ground_truth_thread,\n", + " ),\n", " )\n", - " \n", + "\n", " groundtruth_thread.start()\n", - " \n", + "\n", " return stop_ground_truth_thread, groundtruth_thread" ] }, { "cell_type": "markdown", - "id": "e4dd7d22-aecb-4c2d-b6d2-a282ea8a17e8", + "id": "bca6ebbc", + "metadata": {}, + "source": [ + "Let's now start generating fake labels:\n" + ] + }, + { + "cell_type": "markdown", + "id": "5278b3e1", "metadata": {}, "source": [ "#| hide\n", "\n", - "Let's wait for the endpoint to be in service, and then we can start generating traffic and labels.\n", - "\n", - "
Uncomment the %%script cell magic line to execute this cell.
" + "
Note: \n", + " The %%script cell magic is a convenient way to prevent the notebook from executing a specific cell. If you want to run the cell, comment out the line containing the %%script cell magic.\n", + "
\n" ] }, { "cell_type": "code", - "execution_count": 474, - "id": "0de56c8d-ebde-409a-a7c3-4103790117d2", - "metadata": { - "tags": [] - }, + "execution_count": 214, + "id": "1f516993", + "metadata": {}, "outputs": [], "source": [ - "#| hide\n", - "\n", "%%script false --no-raise-error\n", - "\n", - "waiter = sagemaker_client.get_waiter(\"endpoint_in_service\")\n", - "waiter.wait(\n", - " EndpointName=ENDPOINT,\n", - " WaiterConfig={\n", - " \"Delay\": 10,\n", - " \"MaxAttempts\": 30\n", - " }\n", - ")\n", - "\n", - "predictor = Predictor(endpoint_name=ENDPOINT, serializer=JSONSerializer(), deserializer=JSONDeserializer())\n", - "\n", - "stop_traffic_thread, traffic_thread = generate_traffic(predictor)\n", + "#| eval: false\n", + "#| code: true\n", + "#| output: false\n", "\n", "stop_ground_truth_thread, groundtruth_thread = generate_ground_truth_data(\n", - " predictor, \n", " GROUND_TRUTH_LOCATION\n", ")" ] @@ -5591,7 +5654,7 @@ "\n", "Let's make a prediction for a penguin and include extra fields in the request. This should be flagged by the monitoring job.\n", "\n", - "
Uncomment the %%script cell magic line to execute this cell.
" + "
Uncomment the %%script cell magic line to execute this cell.
\n" ] }, { @@ -5624,99 +5687,102 @@ }, { "cell_type": "markdown", - "id": "04fda9e8-08dc-4323-9e97-eb17194078a1", - "metadata": {}, - "source": [ - "#| hide\n", - "\n", - "### Step 10 - Setting Up Monitoring Jobs\n", - "\n", - "We can now schedule the Monitoring Jobs to continuously monitor the data going into the endpoint and the model performance. We will use the baseline we generated in the pipeline to determine when there's drift. Check [Schedule Data Quality Monitoring Jobs](https://docs.aws.amazon.com/sagemaker/latest/dg/model-monitor-schedule-data-monitor.html) and [Schedule Model Quality Monitoring Jobs](https://docs.aws.amazon.com/sagemaker/latest/dg/model-monitor-model-quality-schedule.html) for more information." - ] - }, - { - "cell_type": "markdown", - "id": "f98d39f9-446d-4cfe-ad94-b50ab31caf68", + "id": "a65bd669", "metadata": {}, "source": [ - "#| hide\n", + "### Step 10 - Preparing Monitoring Functions\n", "\n", - "The following functions will help us work with monitoring schedules later on." + "Let's create a few functions that will help us work with monitoring schedules later on:" ] }, { "cell_type": "code", - "execution_count": 476, + "execution_count": 216, "id": "da145ba1-4966-4dab-8a73-281db364cbc7", "metadata": { "tags": [] }, "outputs": [], "source": [ - "#| hide\n", - "#| eval: false\n", + "from sagemaker.model_monitor import MonitoringExecution\n", + "\n", "\n", "def describe_monitoring_schedules(endpoint_name):\n", " schedules = []\n", - " response = sagemaker_client.list_monitoring_schedules(EndpointName=endpoint_name)[\"MonitoringScheduleSummaries\"]\n", + " response = sagemaker_client.list_monitoring_schedules(EndpointName=endpoint_name)[\n", + " \"MonitoringScheduleSummaries\"\n", + " ]\n", " for item in response:\n", " name = item[\"MonitoringScheduleName\"]\n", " schedule = {\n", " \"MonitoringScheduleName\": name,\n", - " \"MonitoringType\": item[\"MonitoringType\"]\n", + " \"MonitoringType\": item[\"MonitoringType\"],\n", " }\n", - " \n", + "\n", " description = sagemaker_client.describe_monitoring_schedule(\n", " MonitoringScheduleName=name\n", " )\n", - " \n", - " schedule[\"Status\"] = description[\"LastMonitoringExecutionSummary\"][\"MonitoringExecutionStatus\"]\n", - " \n", + "\n", + " schedule[\"Status\"] = description[\"LastMonitoringExecutionSummary\"][\n", + " \"MonitoringExecutionStatus\"\n", + " ]\n", + "\n", " if schedule[\"Status\"] == \"Failed\":\n", - " schedule[\"FailureReason\"] = description[\"LastMonitoringExecutionSummary\"][\"FailureReason\"]\n", + " schedule[\"FailureReason\"] = description[\"LastMonitoringExecutionSummary\"][\n", + " \"FailureReason\"\n", + " ]\n", " elif schedule[\"Status\"] == \"CompletedWithViolations\":\n", - " processing_job_arn = description[\"LastMonitoringExecutionSummary\"][\"ProcessingJobArn\"]\n", + " processing_job_arn = description[\"LastMonitoringExecutionSummary\"][\n", + " \"ProcessingJobArn\"\n", + " ]\n", " execution = MonitoringExecution.from_processing_arn(\n", - " sagemaker_session=sagemaker_session, \n", - " processing_job_arn=processing_job_arn\n", + " sagemaker_session=sagemaker_session,\n", + " processing_job_arn=processing_job_arn,\n", " )\n", " execution_destination = execution.output.destination\n", "\n", - " violations_filepath = os.path.join(execution_destination, \"constraint_violations.json\")\n", - " violations = json.loads(S3Downloader.read_file(violations_filepath))[\"violations\"]\n", - " \n", + " violations_filepath = os.path.join(\n", + " execution_destination, \"constraint_violations.json\"\n", + " )\n", + " violations = json.loads(S3Downloader.read_file(violations_filepath))[\n", + " \"violations\"\n", + " ]\n", + "\n", " schedule[\"Violations\"] = violations\n", "\n", " schedules.append(schedule)\n", - " \n", + "\n", " return schedules\n", "\n", + "\n", "def describe_monitoring_schedule(endpoint_name, monitoring_type):\n", " found = False\n", - " \n", + "\n", " schedules = describe_monitoring_schedules(endpoint_name)\n", " for schedule in schedules:\n", " if schedule[\"MonitoringType\"] == monitoring_type:\n", " found = True\n", " print(json.dumps(schedule, indent=2))\n", "\n", - " if not found: \n", + " if not found:\n", " print(f\"There's no {monitoring_type} Monitoring Schedule.\")\n", "\n", "\n", "def describe_data_monitoring_schedule(endpoint_name):\n", " describe_monitoring_schedule(endpoint_name, \"DataQuality\")\n", "\n", - " \n", + "\n", "def describe_model_monitoring_schedule(endpoint_name):\n", " describe_monitoring_schedule(endpoint_name, \"ModelQuality\")\n", "\n", - " \n", + "\n", "def delete_monitoring_schedule(endpoint_name, monitoring_type):\n", " attempts = 30\n", " found = False\n", - " \n", - " response = sagemaker_client.list_monitoring_schedules(EndpointName=endpoint_name)[\"MonitoringScheduleSummaries\"]\n", + "\n", + " response = sagemaker_client.list_monitoring_schedules(EndpointName=endpoint_name)[\n", + " \"MonitoringScheduleSummaries\"\n", + " ]\n", " for item in response:\n", " if item[\"MonitoringType\"] == monitoring_type:\n", " found = True\n", @@ -5725,9 +5791,11 @@ " )[\"MonitoringScheduleStatus\"]\n", " while status in (\"Pending\", \"InProgress\") and attempts > 0:\n", " attempts -= 1\n", - " print(f\"Monitoring schedule status: {status}. Waiting for it to finish.\")\n", + " print(\n", + " f\"Monitoring schedule status: {status}. Waiting for it to finish.\"\n", + " )\n", " sleep(30)\n", - " \n", + "\n", " status = sagemaker_client.describe_monitoring_schedule(\n", " MonitoringScheduleName=item[\"MonitoringScheduleName\"]\n", " )[\"MonitoringScheduleStatus\"]\n", @@ -5739,1726 +5807,262 @@ " print(\"Monitoring schedule deleted.\")\n", " else:\n", " print(\"Waiting for monitoring schedule timed out\")\n", - " \n", - " if not found: \n", + "\n", + " if not found:\n", " print(f\"There's no {monitoring_type} Monitoring Schedule.\")\n", "\n", - " \n", + "\n", "def delete_data_monitoring_schedule(endpoint_name):\n", " delete_monitoring_schedule(endpoint_name, \"DataQuality\")\n", "\n", - " \n", + "\n", "def delete_model_monitoring_schedule(endpoint_name):\n", " delete_monitoring_schedule(endpoint_name, \"ModelQuality\")" ] }, { "cell_type": "markdown", - "id": "bf379bae-e086-4d56-a677-bfac92e121dd", + "id": "d936df76-e0b8-4dad-a04f-ef77ce2a2df1", "metadata": {}, "source": [ - "#| hide\n", + "### Step 11 - Setting Up Data Monitoring Job\n", + "\n", + "SageMaker looks for violations in the data captured by the endpoint. By default, it combines the input data with the endpoint output and compares the result with the baseline we generated. If we let SageMaker do this, we will get a few violations, for example an \"extra column check\" violation because the field `confidence` doesn't exist in the baseline data.\n", "\n", - "Our pipeline generated data baseline statistics and constraints using our train set. We can take a look at what these values look like by downloading them from S3." + "We can fix these violations by creating a preprocessing script configuring the data we want the monitoring job to use. Check [Preprocessing and Postprocessing](https://docs.aws.amazon.com/sagemaker/latest/dg/model-monitor-pre-and-post-processing.html) for more information about how to configure these scripts.\n", + "\n", + "Let's define the name of the preprocessing script:\n" ] }, { "cell_type": "code", - "execution_count": 477, - "id": "d882526e-e52d-4f1f-84d0-b3fb2edf2b9e", + "execution_count": 237, + "id": "cc119422-2e85-4e8c-86cd-6d59e353d09d", "metadata": { "tags": [] }, - "outputs": [ - { - "data": { - "application/json": { - "dataset": { - "item_count": 239 - }, - "features": [ - { - "inferred_type": "Fractional", - "name": "body_mass_g", - "numerical_statistics": { - "common": { - "num_missing": 0, - "num_present": 239 - }, - "distribution": { - "kll": { - "buckets": [ - { - "count": 12, - "lower_bound": 2700, - "upper_bound": 3060 - }, - { - "count": 31, - "lower_bound": 3060, - "upper_bound": 3420 - }, - { - "count": 46, - "lower_bound": 3420, - "upper_bound": 3780 - }, - { - "count": 37, - "lower_bound": 3780, - "upper_bound": 4140 - }, - { - "count": 26, - "lower_bound": 4140, - "upper_bound": 4500 - }, - { - "count": 31, - "lower_bound": 4500, - "upper_bound": 4860 - }, - { - "count": 22, - "lower_bound": 4860, - "upper_bound": 5220 - }, - { - "count": 18, - "lower_bound": 5220, - "upper_bound": 5580 - }, - { - "count": 13, - "lower_bound": 5580, - "upper_bound": 5940 - }, - { - "count": 3, - "lower_bound": 5940, - "upper_bound": 6300 - } - ], - "sketch": { - "data": [ - [ - 5650, - 4250, - 4000, - 4000, - 5850, - 4400, - 5200, - 5400, - 4600, - 3625, - 5500, - 4450, - 5750, - 4850, - 3500, - 4875, - 3900, - 5300, - 3900, - 4500, - 4750, - 3350, - 4400, - 3150, - 3500, - 5650, - 3050, - 4300, - 3650, - 4850, - 4050, - 5350, - 5800, - 3200, - 5550, - 3850, - 3825, - 3150, - 3600, - 3475, - 4050, - 3275, - 4400, - 4650, - 4300, - 4450, - 3200, - 3550, - 4900, - 5550, - 3350, - 5000, - 3200, - 3500, - 5000, - 3400, - 3300, - 3775, - 5400, - 3400, - 3800, - 4975, - 3550, - 4400, - 4850, - 4800, - 3400, - 4200, - 2900, - 5850, - 3950, - 3325, - 3150, - 3650, - 4725, - 6300, - 5700, - 3600, - 3650, - 5050, - 3400, - 3550, - 5050, - 4750, - 4625, - 4500, - 4850, - 2900, - 4275, - 3325, - 3500, - 4750, - 3175, - 3550, - 5600, - 3900, - 5800, - 5350, - 4725, - 3600, - 3450, - 5500, - 4350, - 3800, - 5250, - 3750, - 5400, - 5550, - 5000, - 3800, - 3975, - 3400, - 4150, - 3100, - 4650, - 6000, - 5550, - 3450, - 3900, - 3500, - 3450, - 3950, - 5100, - 5100, - 4075, - 5150, - 3950, - 4000, - 3800, - 5300, - 3775, - 3425, - 4700, - 4550, - 2900, - 4000, - 2975, - 5000, - 3050, - 3500, - 4650, - 4100, - 5700, - 4300, - 3900, - 4700, - 3750, - 4200, - 4500, - 4950, - 3875, - 4725, - 2900, - 2850, - 3000, - 4250, - 4050, - 5000, - 3075, - 4300, - 4925, - 5700, - 5050, - 3450, - 3900, - 3700, - 3725, - 4300, - 5500, - 5000, - 3000, - 5700, - 3800, - 3950, - 4050, - 3425, - 3300, - 4150, - 4700, - 4400, - 4150, - 4650, - 3325, - 3950, - 3475, - 4750, - 4900, - 3600, - 3200, - 2850, - 3900, - 3700, - 4200, - 4300, - 3800, - 4200, - 3200, - 5200, - 3725, - 3950, - 4600, - 5300, - 3550, - 3325, - 4400, - 5950, - 3475, - 3300, - 3650, - 4700, - 4100, - 3800, - 3250, - 3600, - 3725, - 3450, - 4575, - 3400, - 3400, - 3325, - 3250, - 5200, - 5450, - 3550, - 3600, - 4700, - 3775, - 3700, - 4100, - 2700, - 5600, - 5400, - 4050, - 3700, - 4800, - 3450, - 4400, - 4875, - 4675 - ] - ], - "parameters": { - "c": 0.64, - "k": 2048 - } - } - } - }, - "max": 6300, - "mean": 4206.276150627615, - "min": 2700, - "std_dev": 814.1288584848338, - "sum": 1005300 - } - }, - { - "inferred_type": "Fractional", - "name": "culmen_depth_mm", - "numerical_statistics": { - "common": { - "num_missing": 0, - "num_present": 239 - }, - "distribution": { - "kll": { - "buckets": [ - { - "count": 15, - "lower_bound": 13.1, - "upper_bound": 13.94 - }, - { - "count": 23, - "lower_bound": 13.94, - "upper_bound": 14.78 - }, - { - "count": 25, - "lower_bound": 14.78, - "upper_bound": 15.620000000000001 - }, - { - "count": 23, - "lower_bound": 15.620000000000001, - "upper_bound": 16.46 - }, - { - "count": 33, - "lower_bound": 16.46, - "upper_bound": 17.3 - }, - { - "count": 39, - "lower_bound": 17.3, - "upper_bound": 18.14 - }, - { - "count": 38, - "lower_bound": 18.14, - "upper_bound": 18.98 - }, - { - "count": 27, - "lower_bound": 18.98, - "upper_bound": 19.82 - }, - { - "count": 8, - "lower_bound": 19.82, - "upper_bound": 20.66 - }, - { - "count": 8, - "lower_bound": 20.66, - "upper_bound": 21.5 - } - ], - "sketch": { - "data": [ - [ - 15, - 20, - 19.8, - 19, - 16, - 18.2, - 14.8, - 16.1, - 14.3, - 17.8, - 15, - 18.5, - 15.7, - 14.7, - 19.2, - 15.7, - 18.4, - 14.2, - 20.7, - 20.7, - 13.8, - 16.7, - 13.9, - 18, - 17.9, - 15.7, - 16.6, - 13.9, - 17.5, - 14.2, - 19.6, - 15.9, - 16.2, - 17.7, - 15.3, - 18.3, - 16.5, - 17.2, - 18.6, - 18.1, - 18.1, - 17.6, - 14.4, - 14.4, - 13.7, - 19.4, - 17.2, - 18.5, - 13.9, - 17, - 17.3, - 15.6, - 17.9, - 17.7, - 15.2, - 17.1, - 19.3, - 20.3, - 14.5, - 16.5, - 19, - 15.5, - 19.1, - 21.1, - 15, - 14.5, - 18.1, - 19.4, - 18.1, - 14.6, - 18.1, - 18.6, - 16.9, - 19.2, - 14.6, - 15.2, - 16.8, - 19, - 17, - 15.8, - 18.3, - 18.9, - 14.5, - 13.8, - 14.4, - 19.9, - 14.3, - 17, - 19.5, - 18.4, - 17, - 14.5, - 17, - 18, - 17.3, - 18.1, - 16, - 15.7, - 13.8, - 17.3, - 18.6, - 15.8, - 18.5, - 19.4, - 15.3, - 18.2, - 17.1, - 15.9, - 15, - 18.8, - 20.1, - 17.1, - 21.1, - 18.5, - 19.1, - 16.3, - 15.3, - 17.8, - 18.4, - 19.8, - 17.9, - 19.5, - 15.1, - 13.3, - 18.3, - 15.4, - 18, - 18, - 17.4, - 14.9, - 18.7, - 17.6, - 14.2, - 13.5, - 16.1, - 18.5, - 18.9, - 14.5, - 15.5, - 18.6, - 15.2, - 19, - 16.3, - 18.3, - 17.5, - 17.6, - 17.2, - 13.9, - 13.2, - 15, - 18.5, - 20, - 16.6, - 17.1, - 16.9, - 20.2, - 19.5, - 15, - 16, - 19.6, - 13.7, - 14.3, - 15.4, - 17.9, - 16.8, - 17.8, - 19.8, - 18.1, - 15.1, - 13.1, - 16.8, - 16, - 19.5, - 18.9, - 19.7, - 19, - 17.1, - 18.9, - 14.2, - 19.7, - 18.9, - 13.7, - 17, - 19.2, - 17.2, - 15, - 14.4, - 17.3, - 16.6, - 16.6, - 18.9, - 17.3, - 14.6, - 20.8, - 17.2, - 21.5, - 17.6, - 15.3, - 17, - 20, - 18.8, - 15.8, - 18.6, - 17.9, - 21.1, - 15.9, - 18.4, - 17.3, - 18.7, - 14.2, - 18.5, - 18.8, - 18.2, - 18.8, - 17.9, - 18.8, - 14, - 17.5, - 17.9, - 16.4, - 18, - 15.7, - 15.6, - 16.1, - 17, - 15, - 18.2, - 17.1, - 18.8, - 16.6, - 17, - 16.3, - 19.9, - 17.5, - 20.7, - 19, - 13.4, - 14.6, - 19.6 - ] - ], - "parameters": { - "c": 0.64, - "k": 2048 - } - } - } - }, - "max": 21.5, - "mean": 17.11464435146443, - "min": 13.1, - "std_dev": 1.9505591353403573, - "sum": 4090.3999999999987 - } - }, - { - "inferred_type": "Fractional", - "name": "culmen_length_mm", - "numerical_statistics": { - "common": { - "num_missing": 0, - "num_present": 239 - }, - "distribution": { - "kll": { - "buckets": [ - { - "count": 6, - "lower_bound": 32.1, - "upper_bound": 34.480000000000004 - }, - { - "count": 24, - "lower_bound": 34.480000000000004, - "upper_bound": 36.86 - }, - { - "count": 31, - "lower_bound": 36.86, - "upper_bound": 39.24 - }, - { - "count": 33, - "lower_bound": 39.24, - "upper_bound": 41.620000000000005 - }, - { - "count": 23, - "lower_bound": 41.620000000000005, - "upper_bound": 44 - }, - { - "count": 34, - "lower_bound": 44, - "upper_bound": 46.379999999999995 - }, - { - "count": 33, - "lower_bound": 46.379999999999995, - "upper_bound": 48.76 - }, - { - "count": 36, - "lower_bound": 48.76, - "upper_bound": 51.14 - }, - { - "count": 14, - "lower_bound": 51.14, - "upper_bound": 53.519999999999996 - }, - { - "count": 5, - "lower_bound": 53.519999999999996, - "upper_bound": 55.9 - } - ], - "sketch": { - "data": [ - [ - 47.8, - 37.8, - 55.8, - 40.6, - 55.1, - 49.2, - 45.2, - 49.9, - 48.2, - 38.9, - 49.1, - 50.8, - 50.4, - 44.5, - 43.1, - 44.5, - 40.8, - 51.3, - 39.6, - 42.5, - 44.9, - 42.5, - 45.7, - 36.5, - 46.5, - 54.3, - 35.9, - 43.8, - 46.2, - 46.6, - 50.5, - 50, - 49.5, - 39.7, - 50, - 47.6, - 38.1, - 36.2, - 49.7, - 34.1, - 52, - 38.8, - 45.1, - 46.2, - 45.3, - 41.8, - 34.6, - 42.2, - 43.6, - 52.1, - 42.5, - 46.4, - 40.5, - 39.6, - 50.5, - 34, - 37.6, - 51.7, - 47.6, - 37, - 49.5, - 47.2, - 50.9, - 34.6, - 48.5, - 46.2, - 43.5, - 35.1, - 34.5, - 48.4, - 38.2, - 41.1, - 35.7, - 51.3, - 47.4, - 49.2, - 49.8, - 33.5, - 45.7, - 46.3, - 37.8, - 39.7, - 45.1, - 45.2, - 48.4, - 53.5, - 46.8, - 38.6, - 42.2, - 34.4, - 45.5, - 45.5, - 38.1, - 35.7, - 50.8, - 37.2, - 48.6, - 48.7, - 47.3, - 42.4, - 46.4, - 53.4, - 40.3, - 50.6, - 47.3, - 51.3, - 52.2, - 50.5, - 50.1, - 36.7, - 40.2, - 40.2, - 39.2, - 36, - 39.8, - 51.1, - 50.4, - 36, - 39.7, - 37.7, - 36, - 51.9, - 46.1, - 44.9, - 42.7, - 46.8, - 41.6, - 44.1, - 39.5, - 46.2, - 50.2, - 38.1, - 42.8, - 46.5, - 33.1, - 41.5, - 37.5, - 45.1, - 32.1, - 36.9, - 43.5, - 50.8, - 50, - 41.5, - 41.1, - 42.9, - 38.6, - 45.5, - 46.1, - 47.5, - 41.4, - 41, - 43.2, - 36.4, - 37, - 42, - 42, - 45.5, - 37.7, - 49, - 47.2, - 50.2, - 45, - 35, - 47.5, - 46.4, - 52.7, - 41.1, - 48.1, - 42.9, - 37.3, - 49.6, - 36.3, - 40.5, - 50.7, - 41.1, - 37.8, - 46, - 43.5, - 44.1, - 45.8, - 40.9, - 36.4, - 38.3, - 40.6, - 49.6, - 46.5, - 45.7, - 40.9, - 36.5, - 40.9, - 37.8, - 45.8, - 54.2, - 38.8, - 46, - 41.1, - 46.7, - 40.7, - 38.8, - 39.6, - 45.2, - 40.6, - 38.5, - 41.3, - 49.8, - 36.6, - 36.2, - 39, - 45.8, - 43.2, - 50.2, - 46.1, - 38.9, - 35, - 52.2, - 43.3, - 48.5, - 50.1, - 48.1, - 40.3, - 50.8, - 52.5, - 36.2, - 37.6, - 46.4, - 49.6, - 36, - 51, - 46.9, - 55.9, - 48.4, - 49.3, - 35.5, - 52, - 38.7, - 43.3, - 46.9, - 39.2 - ] - ], - "parameters": { - "c": 0.64, - "k": 2048 - } - } - } - }, - "max": 55.9, - "mean": 43.87322175732217, - "min": 32.1, - "std_dev": 5.483764023274304, - "sum": 10485.699999999999 - } - }, - { - "inferred_type": "Fractional", - "name": "flipper_length_mm", - "numerical_statistics": { - "common": { - "num_missing": 0, - "num_present": 239 - }, - "distribution": { - "kll": { - "buckets": [ - { - "count": 5, - "lower_bound": 174, - "upper_bound": 179.7 - }, - { - "count": 27, - "lower_bound": 179.7, - "upper_bound": 185.4 - }, - { - "count": 45, - "lower_bound": 185.4, - "upper_bound": 191.1 - }, - { - "count": 36, - "lower_bound": 191.1, - "upper_bound": 196.8 - }, - { - "count": 28, - "lower_bound": 196.8, - "upper_bound": 202.5 - }, - { - "count": 13, - "lower_bound": 202.5, - "upper_bound": 208.2 - }, - { - "count": 24, - "lower_bound": 208.2, - "upper_bound": 213.9 - }, - { - "count": 30, - "lower_bound": 213.9, - "upper_bound": 219.6 - }, - { - "count": 18, - "lower_bound": 219.6, - "upper_bound": 225.3 - }, - { - "count": 13, - "lower_bound": 225.3, - "upper_bound": 231 - } - ], - "sketch": { - "data": [ - [ - 215, - 190, - 207, - 199, - 230, - 195, - 212, - 213, - 210, - 181, - 228, - 201, - 222, - 214, - 197, - 217, - 195, - 218, - 191, - 197, - 212, - 187, - 214, - 182, - 192, - 231, - 190, - 208, - 187, - 210, - 201, - 224, - 229, - 193, - 220, - 195, - 198, - 187, - 195, - 193, - 201, - 191, - 210, - 214, - 210, - 198, - 189, - 180, - 217, - 230, - 187, - 221, - 187, - 186, - 216, - 185, - 181, - 194, - 215, - 185, - 200, - 215, - 196, - 198, - 219, - 209, - 202, - 193, - 187, - 213, - 185, - 189, - 185, - 193, - 212, - 221, - 230, - 190, - 195, - 215, - 174, - 184, - 207, - 215, - 203, - 205, - 215, - 188, - 197, - 184, - 196, - 212, - 181, - 202, - 228, - 178, - 230, - 208, - 216, - 181, - 190, - 219, - 196, - 193, - 222, - 197, - 228, - 222, - 225, - 187, - 200, - 193, - 196, - 186, - 184, - 220, - 224, - 195, - 190, - 198, - 190, - 206, - 215, - 213, - 196, - 215, - 192, - 210, - 186, - 221, - 198, - 187, - 209, - 210, - 178, - 201, - 179, - 215, - 188, - 189, - 213, - 210, - 230, - 195, - 190, - 196, - 199, - 210, - 211, - 218, - 202, - 203, - 187, - 184, - 185, - 190, - 200, - 220, - 183, - 212, - 214, - 218, - 220, - 190, - 199, - 191, - 197, - 205, - 209, - 215, - 192, - 225, - 190, - 180, - 203, - 182, - 186, - 195, - 220, - 196, - 197, - 214, - 195, - 189, - 187, - 216, - 217, - 193, - 187, - 181, - 184, - 180, - 210, - 201, - 180, - 194, - 182, - 219, - 190, - 190, - 190, - 215, - 183, - 190, - 195, - 229, - 184, - 187, - 185, - 219, - 192, - 202, - 178, - 190, - 192, - 197, - 208, - 191, - 190, - 199, - 195, - 226, - 221, - 187, - 185, - 216, - 193, - 187, - 203, - 192, - 228, - 220, - 203, - 190, - 210, - 195, - 209, - 222, - 195 - ] - ], - "parameters": { - "c": 0.64, - "k": 2048 - } - } - } - }, - "max": 231, - "mean": 201.06694560669456, - "min": 174, - "std_dev": 14.19866940408198, - "sum": 48055 - } - }, - { - "inferred_type": "String", - "name": "island", - "string_statistics": { - "common": { - "num_missing": 0, - "num_present": 239 - }, - "distinct_count": 3, - "distribution": { - "categorical": { - "buckets": [ - { - "count": 89, - "value": "Dream" - }, - { - "count": 36, - "value": "Torgersen" - }, - { - "count": 114, - "value": "Biscoe" - } - ] - } - } - } - }, - { - "inferred_type": "String", - "name": "species", - "string_statistics": { - "common": { - "num_missing": 0, - "num_present": 239 - }, - "distinct_count": 3, - "distribution": { - "categorical": { - "buckets": [ - { - "count": 106, - "value": "Adelie" - }, - { - "count": 47, - "value": "Chinstrap" - }, - { - "count": 86, - "value": "Gentoo" - } - ] - } - } - } - } - ], - "version": 0 - }, - "text/plain": [ - "" - ] - }, - "execution_count": 477, - "metadata": { - "application/json": { - "expanded": false, - "root": "root" - } - }, - "output_type": "execute_result" - } - ], + "outputs": [], "source": [ - "#| hide\n", - "#| eval: false\n", - "\n", - "statistics = f\"{DATA_QUALITY_LOCATION}/statistics.json\"\n", - "\n", - "response = None\n", - "try:\n", - " response = json.loads(S3Downloader.read_file(statistics))\n", - "except Exception as e:\n", - " pass\n", - "\n", - "JSON(response or {})" + "DATA_QUALITY_PREPROCESSOR = \"data_quality_preprocessor.py\"" + ] + }, + { + "cell_type": "markdown", + "id": "72c1023e", + "metadata": {}, + "source": [ + "We can now define the preprocessing script. Notice that this script will return the input data the endpoint receives with a new `species` column containing the prediction of the model:" ] }, { "cell_type": "code", - "execution_count": 478, - "id": "0a8a3276-46fe-484a-a8b8-d60ad05d63fc", + "execution_count": 238, + "id": "083b0bd0-4035-43fe-9b2c-946b12a5e266", "metadata": { "tags": [] }, "outputs": [ { - "data": { - "application/json": { - "features": [ - { - "completeness": 1, - "inferred_type": "Fractional", - "name": "body_mass_g", - "num_constraints": { - "is_non_negative": true - } - }, - { - "completeness": 1, - "inferred_type": "Fractional", - "name": "culmen_depth_mm", - "num_constraints": { - "is_non_negative": true - } - }, - { - "completeness": 1, - "inferred_type": "Fractional", - "name": "culmen_length_mm", - "num_constraints": { - "is_non_negative": true - } - }, - { - "completeness": 1, - "inferred_type": "Fractional", - "name": "flipper_length_mm", - "num_constraints": { - "is_non_negative": true - } - }, - { - "completeness": 1, - "inferred_type": "String", - "name": "island", - "string_constraints": { - "domains": [ - "Dream", - "Torgersen", - "Biscoe" - ] - } - }, - { - "completeness": 1, - "inferred_type": "String", - "name": "species", - "string_constraints": { - "domains": [ - "Adelie", - "Chinstrap", - "Gentoo" - ] - } - } - ], - "monitoring_config": { - "datatype_check_threshold": 1, - "distribution_constraints": { - "categorical_comparison_threshold": 0.1, - "categorical_drift_method": "LInfinity", - "comparison_method": "Robust", - "comparison_threshold": 0.1, - "perform_comparison": "Enabled" - }, - "domain_content_threshold": 1, - "emit_metrics": "Enabled", - "evaluate_constraints": "Enabled" - }, - "version": 0 - }, - "text/plain": [ - "" - ] - }, - "execution_count": 478, - "metadata": { - "application/json": { - "expanded": false, - "root": "root" - } - }, - "output_type": "execute_result" + "name": "stdout", + "output_type": "stream", + "text": [ + "Overwriting code/data_quality_preprocessor.py\n" + ] } ], "source": [ - "#| hide\n", - "#| eval: false\n", + "%%writefile {CODE_FOLDER}/{DATA_QUALITY_PREPROCESSOR}\n", + "#| code: true\n", + "#| output: false\n", "\n", - "constraints = f\"{DATA_QUALITY_LOCATION}/constraints.json\"\n", + "import json\n", "\n", - "response = None\n", - "try:\n", - " response = json.loads(S3Downloader.read_file(constraints))\n", - "except Exception as e:\n", - " pass\n", + "def preprocess_handler(inference_record):\n", + " input_data = inference_record.endpoint_input.data\n", + " output_data = json.loads(inference_record.endpoint_output.data)\n", + " \n", + " response = json.loads(input_data)\n", + " response[\"species\"] = output_data[\"prediction\"]\n", "\n", - "JSON(response or {})" + " # The `response` variable contains the data that we want the\n", + " # monitoring job to use to compare with the baseline.\n", + " return response" ] }, { "cell_type": "markdown", - "id": "b2c509f8-70cf-4dbd-bada-bf5db9fa35cb", + "id": "840d54c5-f09c-4559-a1d2-63587da0ad14", "metadata": {}, "source": [ - "We also generated the baseline performance using the test set." + "The monitoring schedule expects an S3 location pointing to the preprocessing script. Let's upload the script to the default bucket.\n" ] }, { "cell_type": "code", - "execution_count": 479, - "id": "d57bb7d6-bec2-4557-b836-a821d0db7446", + "execution_count": 240, + "id": "96e5c0c1-7e40-47df-8f40-1d891db13875", "metadata": { "tags": [] }, "outputs": [ { - "data": { - "application/json": { - "multiclass_classification_constraints": { - "accuracy": { - "comparison_operator": "LessThanThreshold", - "threshold": 0.803921568627451 - }, - "weighted_f0_5": { - "comparison_operator": "LessThanThreshold", - "threshold": 0.6903313049357673 - }, - "weighted_f1": { - "comparison_operator": "LessThanThreshold", - "threshold": 0.7247360482654601 - }, - "weighted_f2": { - "comparison_operator": "LessThanThreshold", - "threshold": 0.7681159420289854 - }, - "weighted_precision": { - "comparison_operator": "LessThanThreshold", - "threshold": 0.6710942441492727 - }, - "weighted_recall": { - "comparison_operator": "LessThanThreshold", - "threshold": 0.803921568627451 - } - }, - "version": 0 - }, - "text/plain": [ - "" - ] - }, - "execution_count": 479, - "metadata": { - "application/json": { - "expanded": false, - "root": "root" - } - }, - "output_type": "execute_result" + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials\n" + ] } ], "source": [ - "#| hide\n", - "#| eval: false\n", - "\n", - "constraints = f\"{MODEL_QUALITY_LOCATION}/constraints.json\"\n", - "\n", - "response = None\n", - "try:\n", - " response = json.loads(S3Downloader.read_file(constraints))\n", - "except Exception as e:\n", - " pass\n", + "#| code: true\n", + "#| output: false\n", "\n", - "JSON(response or {})" + "bucket = boto3.Session().resource(\"s3\").Bucket(pipeline_session.default_bucket())\n", + "prefix = \"penguins-monitoring\"\n", + "bucket.Object(os.path.join(prefix, DATA_QUALITY_PREPROCESSOR)).upload_file(\n", + " str(CODE_FOLDER / DATA_QUALITY_PREPROCESSOR)\n", + ")\n", + "data_quality_preprocessor = (\n", + " f\"s3://{os.path.join(bucket.name, prefix, DATA_QUALITY_PREPROCESSOR)}\"\n", + ")" ] }, { "cell_type": "markdown", - "id": "d936df76-e0b8-4dad-a04f-ef77ce2a2df1", + "id": "062fb443", "metadata": {}, "source": [ - "#| hide\n", - "\n", - "SageMaker looks for violations in the data captured by the endpoint. By default, it combines the input data with the endpoint output and compare the result with the baseline we generated. If we let SageMaker do this, we will get a few violations, for example an \"extra column check\" violation because the field `confidence` doesn't exist in the baseline data.\n", - "\n", - "We can fix these violations by creating a preprocessing script configuring the data we want the monitoring job to use. Check [Preprocessing and Postprocessing](https://docs.aws.amazon.com/sagemaker/latest/dg/model-monitor-pre-and-post-processing.html) for more information about how to configure these scripts." + "Our pipeline generated data baseline statistics and constraints using our train set. We can take a look at what these values look like by downloading them from S3:" ] }, { "cell_type": "code", - "execution_count": 480, - "id": "cc119422-2e85-4e8c-86cd-6d59e353d09d", - "metadata": { - "tags": [] - }, - "outputs": [], - "source": [ - "#| hide\n", - "\n", - "DATA_QUALITY_PREPROCESSOR = \"data_quality_preprocessor.py\"" - ] - }, - { - "cell_type": "code", - "execution_count": 481, - "id": "083b0bd0-4035-43fe-9b2c-946b12a5e266", - "metadata": { - "tags": [] - }, + "execution_count": null, + "id": "c3fa3d73", + "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ - "Overwriting code/data_quality_preprocessor.py\n" + "{\n", + " \"name\": \"species\",\n", + " \"inferred_type\": \"String\",\n", + " \"string_statistics\": {\n", + " \"common\": {\n", + " \"num_present\": 344,\n", + " \"num_missing\": 0\n", + " },\n", + " \"distinct_count\": 3.0,\n", + " \"distribution\": {\n", + " \"categorical\": {\n", + " \"buckets\": [\n", + " {\n", + " \"value\": \"Adelie\",\n", + " \"count\": 152\n", + " },\n", + " {\n", + " \"value\": \"Chinstrap\",\n", + " \"count\": 68\n", + " },\n", + " {\n", + " \"value\": \"Gentoo\",\n", + " \"count\": 124\n", + " }\n", + " ]\n", + " }\n", + " }\n", + " }\n", + "}\n" ] } ], "source": [ - "#| hide\n", - "\n", - "%%writefile {CODE_FOLDER}/{DATA_QUALITY_PREPROCESSOR}\n", - "import json\n", - "\n", - "def preprocess_handler(inference_record):\n", - " input_data = inference_record.endpoint_input.data\n", - " output_data = json.loads(inference_record.endpoint_output.data)\n", - " \n", - " response = json.loads(input_data)\n", - " response[\"species\"] = output_data[\"prediction\"]\n", - "\n", - " # The `response` variable contains the data that we want the\n", - " # monitoring job to use to compare with the baseline.\n", - " return response" + "try:\n", + " response = json.loads(\n", + " S3Downloader.read_file(f\"{DATA_QUALITY_LOCATION}/statistics.json\")\n", + " )\n", + " print(json.dumps(response[\"features\"][0], indent=2))\n", + "except Exception as e:\n", + " pass" ] }, { "cell_type": "markdown", - "id": "840d54c5-f09c-4559-a1d2-63587da0ad14", + "id": "20067cda", "metadata": {}, "source": [ - "The monitoring schedule expects an S3 location pointing to the preprocessing script. Let's upload the script to the default bucket." + "And here are the constraints:" ] }, { "cell_type": "code", - "execution_count": 482, - "id": "96e5c0c1-7e40-47df-8f40-1d891db13875", - "metadata": { - "tags": [] - }, + "execution_count": null, + "id": "7e940197", + "metadata": {}, "outputs": [ { - "data": { - "text/plain": [ - "'s3://mlschool/penguins-monitoring/data_quality_preprocessor.py'" - ] - }, - "execution_count": 482, - "metadata": {}, - "output_type": "execute_result" + "name": "stdout", + "output_type": "stream", + "text": [ + "{\n", + " \"name\": \"species\",\n", + " \"inferred_type\": \"String\",\n", + " \"completeness\": 1.0,\n", + " \"string_constraints\": {\n", + " \"domains\": [\n", + " \"Adelie\",\n", + " \"Chinstrap\",\n", + " \"Gentoo\"\n", + " ]\n", + " }\n", + "}\n" + ] } ], "source": [ - "#| hide\n", - "#| eval: false\n", - "\n", - "\n", - "bucket = boto3.Session().resource(\"s3\").Bucket(pipeline_session.default_bucket())\n", - "prefix = \"penguins-monitoring\"\n", - "bucket.Object(os.path.join(prefix, DATA_QUALITY_PREPROCESSOR)).upload_file(str(CODE_FOLDER / DATA_QUALITY_PREPROCESSOR))\n", - "data_quality_preprocessor = f\"s3://{os.path.join(bucket.name, prefix, DATA_QUALITY_PREPROCESSOR)}\"\n", - "data_quality_preprocessor" + "try:\n", + " response = json.loads(S3Downloader.read_file(f\"{DATA_QUALITY_LOCATION}/constraints.json\"))\n", + " print(json.dumps(response[\"features\"][0], indent=2))\n", + "except Exception as e:\n", + " pass" ] }, { "cell_type": "markdown", "id": "56e107eb-546d-431c-b74d-1bfd412711b7", "metadata": {}, + "source": [ + "We can now set up the Data Quality Monitoring Job using the [DefaultModelMonitor](https://sagemaker.readthedocs.io/en/stable/api/inference/model_monitor.html#sagemaker.model_monitor.model_monitoring.DefaultModelMonitor) class. Notice how we specify the `record_preprocessor_script` using the S3 location where we uploaded our script." + ] + }, + { + "cell_type": "markdown", + "id": "e653b628", + "metadata": {}, "source": [ "#| hide\n", "\n", - "We can now set up the Data Quality Monitoring Job using the [DefaultModelMonitor](https://sagemaker.readthedocs.io/en/stable/api/inference/model_monitor.html#sagemaker.model_monitor.model_monitoring.DefaultModelMonitor) class. Notice how we specify the `record_preprocessor_script` using the S3 location where we uploaded our script.\n", - "\n", - "
Uncomment the %%script cell magic line to execute this cell.
" + "
Note: \n", + " The %%script cell magic is a convenient way to prevent the notebook from executing a specific cell. If you want to run the cell, comment out the line containing the %%script cell magic.\n", + "
" ] }, { "cell_type": "code", - "execution_count": 483, + "execution_count": 242, "id": "15caf9e1-97fc-4379-893b-6062d4bd876e", "metadata": { "tags": [] }, "outputs": [], "source": [ - "#| hide\n", - "\n", "%%script false --no-raise-error\n", + "#| code: true\n", + "#| output: false\n", + "#| eval: false\n", + "\n", + "from sagemaker.model_monitor import CronExpressionGenerator, DefaultModelMonitor\n", "\n", "data_monitor = DefaultModelMonitor(\n", " instance_type=\"ml.m5.xlarge\",\n", @@ -7483,14 +6087,12 @@ "id": "018800f7-315f-4f5e-b082-ba94bbde91ad", "metadata": {}, "source": [ - "#| hide\n", - "\n", - "We can check the results of the monitoring job by looking at whether it generated any violations." + "We can check the results of the monitoring job by looking at whether it generated any violations:" ] }, { "cell_type": "code", - "execution_count": 484, + "execution_count": 243, "id": "2c04fdd4-cc03-496c-a0a1-405854505c46", "metadata": { "tags": [] @@ -7505,9 +6107,6 @@ } ], "source": [ - "#| hide\n", - "#| eval: false\n", - "\n", "describe_data_monitoring_schedule(ENDPOINT)" ] }, @@ -7516,13 +6115,89 @@ "id": "3a9d201d-f60f-49f2-b4e9-eb0a0159ecfd", "metadata": {}, "source": [ - "#| hide\n", + "### Step 12 - Setting up Model Monitoring Job\n", "\n", "To set up a Model Quality Monitoring Job, we can use the [ModelQualityMonitor](https://sagemaker.readthedocs.io/en/stable/api/inference/model_monitor.html#sagemaker.model_monitor.model_monitoring.ModelQualityMonitor) class. The [EndpointInput](https://sagemaker.readthedocs.io/en/v2.24.2/api/inference/model_monitor.html#sagemaker.model_monitor.model_monitoring.EndpointInput) instance configures the attribute the monitoring job should use to determine the prediction from the model.\n", "\n", - "Check [Amazon SageMaker Model Quality Monitor](https://sagemaker-examples.readthedocs.io/en/latest/sagemaker_model_monitor/model_quality/model_quality_churn_sdk.html) for a complete tutorial on how to run a Model Monitoring Job in SageMaker.\n", + "Check [Amazon SageMaker Model Quality Monitor](https://sagemaker-examples.readthedocs.io/en/latest/sagemaker_model_monitor/model_quality/model_quality_churn_sdk.html) for a complete tutorial on how to run a Model Monitoring Job in SageMaker." + ] + }, + { + "cell_type": "markdown", + "id": "664a2f3a", + "metadata": {}, + "source": [ + "Let's check the baseline performance that we generated using the test set:\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "ecd37e48", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "{\n", + " \"version\": 0.0,\n", + " \"multiclass_classification_constraints\": {\n", + " \"accuracy\": {\n", + " \"threshold\": 0.9259259259259259,\n", + " \"comparison_operator\": \"LessThanThreshold\"\n", + " },\n", + " \"weighted_recall\": {\n", + " \"threshold\": 0.9259259259259259,\n", + " \"comparison_operator\": \"LessThanThreshold\"\n", + " },\n", + " \"weighted_precision\": {\n", + " \"threshold\": 0.933862433862434,\n", + " \"comparison_operator\": \"LessThanThreshold\"\n", + " },\n", + " \"weighted_f0_5\": {\n", + " \"threshold\": 0.928855833521148,\n", + " \"comparison_operator\": \"LessThanThreshold\"\n", + " },\n", + " \"weighted_f1\": {\n", + " \"threshold\": 0.9247293447293448,\n", + " \"comparison_operator\": \"LessThanThreshold\"\n", + " },\n", + " \"weighted_f2\": {\n", + " \"threshold\": 0.9242942991137502,\n", + " \"comparison_operator\": \"LessThanThreshold\"\n", + " }\n", + " }\n", + "}\n" + ] + } + ], + "source": [ + "try:\n", + " response = json.loads(S3Downloader.read_file(f\"{MODEL_QUALITY_LOCATION}/constraints.json\"))\n", + " print(json.dumps(response, indent=2))\n", + "except Exception as e:\n", + " pass" + ] + }, + { + "cell_type": "markdown", + "id": "9d217afd", + "metadata": {}, + "source": [ + "We can now start the Model Quality Monitoring Job:" + ] + }, + { + "cell_type": "markdown", + "id": "cd771884", + "metadata": {}, + "source": [ + "#| hide\n", "\n", - "
Uncomment the %%script cell magic line to execute this cell.
" + "
Note: \n", + " The %%script cell magic is a convenient way to prevent the notebook from executing a specific cell. If you want to run the cell, comment out the line containing the %%script cell magic.\n", + "
" ] }, { @@ -7532,9 +6207,10 @@ "metadata": {}, "outputs": [], "source": [ - "#| hide\n", - "\n", "%%script false --no-raise-error\n", + "#| code: true\n", + "#| output: false\n", + "#| eval: false\n", "\n", "model_monitor = ModelQualityMonitor(\n", " instance_type=\"ml.m5.xlarge\",\n", @@ -7568,14 +6244,12 @@ "id": "8d9e523e-49c5-4382-b28a-cdbece9bd0e0", "metadata": {}, "source": [ - "#| hide\n", - "\n", - "We can check the results of the monitoring job by looking at whether it generated any violations." + "We can check the results of the monitoring job by looking at whether it generated any violations.\n" ] }, { "cell_type": "code", - "execution_count": 486, + "execution_count": 245, "id": "347de298-16f2-42e0-85c4-dfc916080020", "metadata": { "tags": [] @@ -7590,12 +6264,19 @@ } ], "source": [ - "#| hide\n", - "#| eval: false\n", - "\n", "describe_model_monitoring_schedule(ENDPOINT)" ] }, + { + "cell_type": "markdown", + "id": "38c3d9f6", + "metadata": {}, + "source": [ + "### Step 13 - Stopping Monitoring Jobs\n", + "\n", + "The following code will stop the generation of traffic and labels, delete the monitoring jobs, and delete the endpoint." + ] + }, { "cell_type": "markdown", "id": "2c267ea0-f9c0-4bf2-8281-8d21edebb2a0", @@ -7603,23 +6284,24 @@ "source": [ "#| hide\n", "\n", - "The following code will stop the generation of traffic and labels, delete the monitoring jobs, and delete the endpoint.\n", - "\n", - "
Uncomment the %%script cell magic line to execute this cell.
" + "
Note: \n", + " The %%script cell magic is a convenient way to prevent the notebook from executing a specific cell. If you want to run the cell, comment out the line containing the %%script cell magic.\n", + "
" ] }, { "cell_type": "code", - "execution_count": 487, + "execution_count": 247, "id": "bb74dc04-54a1-4a3f-854f-4877f7f0b4a1", "metadata": { "tags": [] }, "outputs": [], "source": [ - "#| hide\n", - "\n", "%%script false --no-raise-error\n", + "#| code: true\n", + "#| output: false\n", + "#| eval: false\n", "\n", "stop_traffic_thread.set()\n", "traffic_thread.join()\n", @@ -7628,33 +6310,7 @@ "groundtruth_thread.join()\n", "\n", "delete_data_monitoring_schedule(ENDPOINT)\n", - "delete_model_monitoring_schedule(ENDPOINT)" - ] - }, - { - "cell_type": "markdown", - "id": "633702bd-f750-4fdb-9706-1e85f6f8c81a", - "metadata": {}, - "source": [ - "#| hide\n", - "\n", - "Let's now delete the endpoint.\n", - "\n", - "
Uncomment the %%script cell magic line to execute this cell.
" - ] - }, - { - "cell_type": "code", - "execution_count": 488, - "id": "4d459483-b81b-4f18-832d-da4d3dddf38a", - "metadata": { - "tags": [] - }, - "outputs": [], - "source": [ - "#| hide\n", - "\n", - "%%script false --no-raise-error\n", + "delete_model_monitoring_schedule(ENDPOINT)\n", "\n", "predictor.delete_endpoint()" ] @@ -7664,80 +6320,18 @@ "id": "db0d6d8d-791c-4ae0-ba79-e0da33d0ece2", "metadata": {}, "source": [ - "#| hide\n", - "\n", - "### Questions\n", - "\n", - "Answering these questions will help you understand the material we discussed during this session. Notice that each question could have one or more correct answers.\n", - "\n", - "\n", - "
Question 6.1
\n", - "\n", - "To compute the data and the model quality baselines, we use the `train-baseline` and `test-baseline` outputs from the Preprocessing step of the pipeline. Which of the following is the reason we don't use the `train` and `test` outputs?\n", - "\n", - "1. The `train` and `test` outputs are being used in the Train and Evaluation steps, and SageMaker doesn't allow to reuse outputs across a pipeline.\n", - "2. Computing the two baselines requires the data to be transformed with the SciKit-Learn pipeline we created as part of the Preprocessing step.\n", - "3. Computing the two baselines requires the data to be in its original format.\n", - "4. Computing the two baselines requires JSON data, but the `train` and `test` outputs are in CSV format.\n", - "\n", - "\n", - "
Question 6.2
\n", - "\n", - "You build a computer vision model to recognize the brand and model of luxury handbags. After you deploy the model, one of the most important brands releases a new handbag that your model can't predict. How would you classify this type of model drift?\n", - "\n", - "1. Sudden drift.\n", - "2. Gradual drift.\n", - "3. Incremental drift.\n", - "4. Reocurring drift.\n", - "\n", - "\n", - "
Question 6.3
\n", - "\n", - "We use a custom script as part of the creation of the Data Monitoring schedule. Why do we need this custom script?\n", - "\n", - "1. This script expands the input data with the fields coming from the endpoint output.\n", - "2. This script combines the input data with the endpoint output.\n", - "3. This script prevents the monitoring job from reporting superfluous violations.\n", - "4. This script expands the list of fields with the data SageMaker needs to detect violations.\n", - "\n", - "
Question 6.4
\n", - "\n", - "We created a function to randomnly generated labels for the data captured by the endpoint. How does SageMaker know which label corresponds to a specific request?\n", - "\n", - "1. SageMaker uses the timestamp of the request.\n", - "2. SageMaker uses the `inference_id` field that we send on every request to the endpoint.\n", - "3. SageMaker uses the `event_id` field that we send on every request to the endpoint.\n", - "4. SageMaker uses the `label_id` field that we send on every request to the endpoint.\n", - "\n", - "\n", - "
Question 6.5
\n", - "\n", - "We use a Transform Step to generate predictions for the test data using our model. When configuring this step, we filter the result from the step using the `output_filter` attribute. Assuming we configure this attribute with the value `$.SageMakerOutput['prediction','groundtruth']`, which of the following statements should be correct about the endpoint?\n", - "\n", - "1. The endpoint should return a top-level field with the name `prediction`.\n", - "2. The endpoint should return a top-level field with the name `groundtruth`.\n", - "3. The endpoint should return a top-level field with the name `SageMakerOutput`.\n", - "4. The test dataset should include a field with the name `groundtruth`.\n", - "\n", - "\n", "### Assignments\n", "\n", - "* Assignment 6.1 We built a custom inference script to handle the input and output of our endpoint. However, this custom code doesn't support processing more than one sample simultaneously. Modify the inference script to allow the processing of multiple samples in a single request. The output should be a JSON containing an array of objects with the prediction and the confidence corresponding to each input sample.\n", + "* Assignment 5.1 We built a custom inference script to handle the input and output of our endpoint. However, this custom code doesn't support processing more than one sample simultaneously. Modify the inference script to allow the processing of multiple samples in a single request. The output should be a JSON containing an array of objects with the prediction and the confidence corresponding to each input sample.\n", "\n", - "* Assignment 6.2 You can visualize the results of your monitoring jobs in Amazon SageMaker Studio. Go to your endpoint, and visit the Data quality and Model quality tabs. View the details of your monitoring jobs, and create a few charts to explore the baseline and the captured values for any metric that the monitoring job calculates.\n", + "* Assignment 5.2 You can visualize the results of your monitoring jobs in Amazon SageMaker Studio. Go to your endpoint, and visit the Data quality and Model quality tabs. View the details of your monitoring jobs, and create a few charts to explore the baseline and the captured values for any metric that the monitoring job calculates.\n", "\n", - "* Assignment 6.3 The QualityCheck Step runs a processing job to compute baseline statistics and constraints from the input dataset. We configured the pipeline to generate the initial baselines every time it runs. Modify the code to prevent the pipeline from registering a new version of the model if the dataset violates the baseline of the previous model version. You can configure the QualityCheck Step to accomplish this.\n", + "* Assignment 5.3 The QualityCheck Step runs a processing job to compute baseline statistics and constraints from the input dataset. We configured the pipeline to generate the initial baselines every time it runs. Modify the code to prevent the pipeline from registering a new version of the model if the dataset violates the baseline of the previous model version. You can configure the QualityCheck Step to accomplish this.\n", "\n", - "* Assignment 6.4 We are generating predictions for the test set twice during the execution of our pipeline. First, in the Evaluation step, and then using a Transform Step in anticipation of generating the baseline to monitor the model. Modify the pipeline to remove the Evaluation step and reuse the metrics computed by the QualityCheck Step to determine whether we should register the model.\n", + "* Assignment 5.4 We are generating predictions for the test set twice during the execution of our pipeline. First, in the Evaluation step, and then using a Transform Step in anticipation of generating the baseline to monitor the model. Modify the pipeline to remove the Evaluation step and reuse the metrics computed by the QualityCheck Step to determine whether we should register the model.\n", "\n", - "* Assignment 6.5 Modify the SageMaker Pipeline you created for the \"Pipeline of Digits\" project and add the necessary steps to generate a model quality baseline. Schedule a Model Monitoring Job that reports any violations if there's model drift." + "* Assignment 5.5 Modify the SageMaker Pipeline you created for the \"Pipeline of Digits\" project and add the necessary steps to generate a model quality baseline. Schedule a Model Monitoring Job that reports any violations if there's model drift.\n" ] - }, - { - "cell_type": "markdown", - "id": "a99818df", - "metadata": {}, - "source": [] } ], "metadata": { diff --git a/program/index.qmd b/program/index.qmd index 22e8275..b487df0 100644 --- a/program/index.qmd +++ b/program/index.qmd @@ -11,7 +11,7 @@ Welcome to the program! ## Program Structure -**Session 1 - Production Machine Learning Is Different** +#### Session 1 - Production Machine Learning Is Different * An overview of the components of a machine learning system * The role of data in real-world applications @@ -24,11 +24,11 @@ Welcome to the program! * A template architecture of a production-ready machine learning system * Understanding SageMaker’s Processing Step and Processing Jobs -**Session 2 - Building Models And The Training Pipeline** +#### Session 2 - Building Models And The Training Pipeline * The first rule of Machine Learning Engineering * A 3-step process to solve a problem using machine learning -* 10 tips to select the best machine learning model for your solution +* 9 tips to select the best machine learning model for your solution * Strategies for working with imbalanced data, dealing with rare events, and a quick introduction to cost-sensitive learning * The reason you should not balance your data * An introduction to hyperparameter tuning @@ -36,7 +36,7 @@ Welcome to the program! * Distributed Training using data and model parallelism * Understanding SageMaker’s Training and Tuning Steps, and Training and Tuning Jobs -**Session 3 - Evaluating and Versioning Models** +#### Session 3 - Evaluating and Versioning Models * The difference between good models and useful models * Framing evaluation metrics in the context of business performance @@ -50,6 +50,29 @@ Welcome to the program! * An introduction to model versioning * Understanding SageMaker’s Model Registry, Condition, and Model Steps +#### Session 4 - Deploying Models and Serving Predictions + +* How do model performance, speed, and cost affect models in production +* Latency, throughput, and their relationships +* Understanding on-demand inference and batch inference and when to use each one +* How to make models run fast using model compression and a quick introduction to quantization and knowledge distillation +* Deploying models in dedicated and multi-model endpoints +* A comparison of the tools you can use to serve predictions +* Designing a 3-component inference pipeline +* Understanding the internal structure of a SageMaker Endpoint +* Understanding SageMaker's PipelineModel and Amazon EventBridge + + +#### Session 5 - Data Distribution Shifts And Model Monitoring + +* The 3 most common problems your model will face in production +* An introduction to data distribution shifts, edge cases, and unintended feedback loops +* Catastrophic predictions and the problem with edge cases +* Understanding covariate shift and concept drift +* Monitoring schema violations, data statistics, model performance, prediction distribution, and changes in user feedback +* The 3 strategies to keep your models working despite data distribution shifts +* Understanding SageMaker’s Transform Step, QualityCheck Step, Transform Jobs, and Monitoring Jobs + ## Table of Contents diff --git a/program/setup.qmd b/program/setup.qmd index 627bca3..8a6b6ab 100644 --- a/program/setup.qmd +++ b/program/setup.qmd @@ -7,7 +7,7 @@ listing: categories: true --- -Here is a summary of the steps you need to follow to configure your local environment: +Here are the steps you need to follow to set up the project: Start by forking the program's [GitHub Repository](https://github.com/svpino/ml.school) and clone it on your local computer.