feat(components): [AWS SageMaker] Minimize inputs for mnist classific…

…ation pipeline (#4192) * minimize parameters and bug fixes * endpoint_name variable fix * only add necessary inputs * remove run_time * trim down v2 * Update example with S3 bucket as runtime input * nit fix * fix variable names
kubeflow · Jul 17, 2020 · 6860681 · 6860681
1 parent 45c796e
commit 6860681
Show file tree

Hide file tree

Showing 3 changed files with 127 additions and 199 deletions.
diff --git a/samples/contrib/aws-samples/README.md b/samples/contrib/aws-samples/README.md
@@ -118,20 +118,8 @@ There are two ways you can give them access to SageMaker.
 
 ## Inputs to the pipeline
 
-### Sample MNIST dataset
-
-The following commands will copy the data extraction pre-processing script to an S3 bucket which we will use to store artifacts for the pipeline.
-
-1. [Create a bucket](https://docs.aws.amazon.com/AmazonS3/latest/gsg/CreatingABucket.html) in `us-east-1` region if you don't have one already. 
-For the purposes of this demonstration, all resources will be created in the us-east-1 region.
-
-2. Upload the `mnist-kmeans-sagemaker/kmeans_preprocessing.py` file to your bucket with the prefix `mnist_kmeans_example/processing_code/kmeans_preprocessing.py`.
-This can be done with the following command, replacing `<bucket-name>` with the name of the bucket you previously created in `us-east-1`:
-    ```
-    aws s3 cp mnist-kmeans-sagemaker/kmeans_preprocessing.py s3://<bucket-name>/mnist_kmeans_example/processing_code/kmeans_preprocessing.py
-    ```
-
 ### Role Input
+**Note:** Ignore this section if you plan to run [titanic-survival-prediction](https://github.com/kubeflow/pipelines/tree/master/samples/contrib/aws-samples/titanic-survival-prediction) example
 
 This role is used by SageMaker jobs created by the KFP to access the S3 buckets and other AWS resources.
 Run these commands to create the sagemaker-execution-role.   

diff --git a/samples/contrib/aws-samples/mnist-kmeans-sagemaker/README.md b/samples/contrib/aws-samples/mnist-kmeans-sagemaker/README.md
@@ -4,8 +4,21 @@ The `mnist-classification-pipeline.py` sample runs a pipeline to train a classfi
 
 ## Prerequisites 
 
+### Setup K8s cluster and authentication
 Make sure you have the setup explained in this [README.md](https://github.com/kubeflow/pipelines/blob/master/samples/contrib/aws-samples/README.md)
 
+### Sample MNIST dataset
+
+The following commands will copy the data extraction pre-processing script to an S3 bucket which we will use to store artifacts for the pipeline.
+
+1. [Create a bucket](https://docs.aws.amazon.com/AmazonS3/latest/gsg/CreatingABucket.html) in `us-east-1` region if you don't have one already. 
+For the purposes of this demonstration, all resources will be created in the us-east-1 region.
+
+2. Upload the `mnist-kmeans-sagemaker/kmeans_preprocessing.py` file to your bucket with the prefix `mnist_kmeans_example/processing_code/kmeans_preprocessing.py`.
+This can be done with the following command, replacing `<bucket-name>` with the name of the bucket you previously created in `us-east-1`:
+    ```
+    aws s3 cp mnist-kmeans-sagemaker/kmeans_preprocessing.py s3://<bucket-name>/mnist_kmeans_example/processing_code/kmeans_preprocessing.py
+    ```
 
 ## Compiling the pipeline template
 
@@ -20,32 +33,49 @@ dsl-compile --py mnist-classification-pipeline.py --output mnist-classification-
 
 Open the Kubeflow pipelines UI. Create a new pipeline, and then upload the compiled specification (`.tar.gz` file) as a new pipeline template.
 
-The pipeline requires several arguments, replace `role_arn` and data path with your settings.
+Provide the `role_arn` and `bucket_name` you created as pipeline inputs.
 
 Once the pipeline done, you can go to `batch_transform_ouput` to check your batch prediction results.
-You will have an model endpoint in service. Please remember to clean it up.
+You will also have an model endpoint in service. Refer to [Prediction section](#Prediction) below to run predictions aganist your deployed model aganist the endpoint. Please remember to clean up the endpoint.
 
 
 ## Prediction
 
-Open SageMaker [console](https://us-east-1.console.aws.amazon.com/sagemaker/home?region=us-east-1#/endpoints) and find your endpoint name, You can call endpoint in this way. Please check dataset section to get `train_set`.
+1. Find your endpoint name either by,
+  - Opening SageMaker [console](https://us-east-1.console.aws.amazon.com/sagemaker/home?region=us-east-1#/endpoints),  or
+  - Clicking the `sagemaker-deploy-model-endpoint_name` under `Output artifacts` of `SageMaker - Deploy Model` component of the pipeline run
+
+2. Setup AWS credentials with `sagemaker:InvokeEndpoint` access. [Sample commands](https://sagemaker.readthedocs.io/en/stable/workflows/kubernetes/using_amazon_sagemaker_components.html#configure-permissions-to-run-predictions)
+3. Update the `ENDPOINT_NAME` variable in the script below
+4. Run the script below to invoke the endpoint
 
 ```python
 import json
 import io
 import boto3
+import pickle
+import urllib.request
+import gzip
+import numpy
+
+ENDPOINT_NAME="<your_endpoint_name>"
 
-# Simple function to create a csv from our numpy array
+# Simple function to create a csv from numpy array
 def np2csv(arr):
     csv = io.BytesIO()
     numpy.savetxt(csv, arr, delimiter=',', fmt='%g')
     return csv.getvalue().decode().rstrip()
 
-runtime = boto3.Session().client('sagemaker-runtime')
+# Prepare input for the model
+urllib.request.urlretrieve("http://deeplearning.net/data/mnist/mnist.pkl.gz", "mnist.pkl.gz")
+with gzip.open('mnist.pkl.gz', 'rb') as f:
+    train_set, _, _ = pickle.load(f, encoding='latin1')
 
 payload = np2csv(train_set[0][30:31])
 
-response = runtime.invoke_endpoint(EndpointName='Endpoint-20190502202738-LDKG',
+# Run prediction aganist the endpoint created by the pipeline
+runtime = boto3.Session(region_name='us-east-1').client('sagemaker-runtime')
+response = runtime.invoke_endpoint(EndpointName=ENDPOINT_NAME,
                                    ContentType='text/csv',
                                    Body=payload)
 result = json.loads(response['Body'].read().decode())