Skip to content

Commit

Permalink
feat(components): [AWS SageMaker] Minimize inputs for mnist classific…
Browse files Browse the repository at this point in the history
…ation pipeline (#4192)

* minimize parameters and bug fixes

* endpoint_name variable fix

* only add necessary inputs

* remove run_time

* trim down v2

* Update example with S3 bucket as runtime input

* nit fix

* fix variable names
  • Loading branch information
surajkota authored Jul 17, 2020
1 parent 45c796e commit 6860681
Show file tree
Hide file tree
Showing 3 changed files with 127 additions and 199 deletions.
14 changes: 1 addition & 13 deletions samples/contrib/aws-samples/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -118,20 +118,8 @@ There are two ways you can give them access to SageMaker.

## Inputs to the pipeline

### Sample MNIST dataset

The following commands will copy the data extraction pre-processing script to an S3 bucket which we will use to store artifacts for the pipeline.

1. [Create a bucket](https://docs.aws.amazon.com/AmazonS3/latest/gsg/CreatingABucket.html) in `us-east-1` region if you don't have one already.
For the purposes of this demonstration, all resources will be created in the us-east-1 region.

2. Upload the `mnist-kmeans-sagemaker/kmeans_preprocessing.py` file to your bucket with the prefix `mnist_kmeans_example/processing_code/kmeans_preprocessing.py`.
This can be done with the following command, replacing `<bucket-name>` with the name of the bucket you previously created in `us-east-1`:
```
aws s3 cp mnist-kmeans-sagemaker/kmeans_preprocessing.py s3://<bucket-name>/mnist_kmeans_example/processing_code/kmeans_preprocessing.py
```

### Role Input
**Note:** Ignore this section if you plan to run [titanic-survival-prediction](https://github.com/kubeflow/pipelines/tree/master/samples/contrib/aws-samples/titanic-survival-prediction) example

This role is used by SageMaker jobs created by the KFP to access the S3 buckets and other AWS resources.
Run these commands to create the sagemaker-execution-role.
Expand Down
42 changes: 36 additions & 6 deletions samples/contrib/aws-samples/mnist-kmeans-sagemaker/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,21 @@ The `mnist-classification-pipeline.py` sample runs a pipeline to train a classfi

## Prerequisites

### Setup K8s cluster and authentication
Make sure you have the setup explained in this [README.md](https://github.com/kubeflow/pipelines/blob/master/samples/contrib/aws-samples/README.md)

### Sample MNIST dataset

The following commands will copy the data extraction pre-processing script to an S3 bucket which we will use to store artifacts for the pipeline.

1. [Create a bucket](https://docs.aws.amazon.com/AmazonS3/latest/gsg/CreatingABucket.html) in `us-east-1` region if you don't have one already.
For the purposes of this demonstration, all resources will be created in the us-east-1 region.

2. Upload the `mnist-kmeans-sagemaker/kmeans_preprocessing.py` file to your bucket with the prefix `mnist_kmeans_example/processing_code/kmeans_preprocessing.py`.
This can be done with the following command, replacing `<bucket-name>` with the name of the bucket you previously created in `us-east-1`:
```
aws s3 cp mnist-kmeans-sagemaker/kmeans_preprocessing.py s3://<bucket-name>/mnist_kmeans_example/processing_code/kmeans_preprocessing.py
```

## Compiling the pipeline template

Expand All @@ -20,32 +33,49 @@ dsl-compile --py mnist-classification-pipeline.py --output mnist-classification-

Open the Kubeflow pipelines UI. Create a new pipeline, and then upload the compiled specification (`.tar.gz` file) as a new pipeline template.

The pipeline requires several arguments, replace `role_arn` and data path with your settings.
Provide the `role_arn` and `bucket_name` you created as pipeline inputs.

Once the pipeline done, you can go to `batch_transform_ouput` to check your batch prediction results.
You will have an model endpoint in service. Please remember to clean it up.
You will also have an model endpoint in service. Refer to [Prediction section](#Prediction) below to run predictions aganist your deployed model aganist the endpoint. Please remember to clean up the endpoint.


## Prediction

Open SageMaker [console](https://us-east-1.console.aws.amazon.com/sagemaker/home?region=us-east-1#/endpoints) and find your endpoint name, You can call endpoint in this way. Please check dataset section to get `train_set`.
1. Find your endpoint name either by,
- Opening SageMaker [console](https://us-east-1.console.aws.amazon.com/sagemaker/home?region=us-east-1#/endpoints), or
- Clicking the `sagemaker-deploy-model-endpoint_name` under `Output artifacts` of `SageMaker - Deploy Model` component of the pipeline run

2. Setup AWS credentials with `sagemaker:InvokeEndpoint` access. [Sample commands](https://sagemaker.readthedocs.io/en/stable/workflows/kubernetes/using_amazon_sagemaker_components.html#configure-permissions-to-run-predictions)
3. Update the `ENDPOINT_NAME` variable in the script below
4. Run the script below to invoke the endpoint

```python
import json
import io
import boto3
import pickle
import urllib.request
import gzip
import numpy

ENDPOINT_NAME="<your_endpoint_name>"

# Simple function to create a csv from our numpy array
# Simple function to create a csv from numpy array
def np2csv(arr):
csv = io.BytesIO()
numpy.savetxt(csv, arr, delimiter=',', fmt='%g')
return csv.getvalue().decode().rstrip()

runtime = boto3.Session().client('sagemaker-runtime')
# Prepare input for the model
urllib.request.urlretrieve("http://deeplearning.net/data/mnist/mnist.pkl.gz", "mnist.pkl.gz")
with gzip.open('mnist.pkl.gz', 'rb') as f:
train_set, _, _ = pickle.load(f, encoding='latin1')

payload = np2csv(train_set[0][30:31])

response = runtime.invoke_endpoint(EndpointName='Endpoint-20190502202738-LDKG',
# Run prediction aganist the endpoint created by the pipeline
runtime = boto3.Session(region_name='us-east-1').client('sagemaker-runtime')
response = runtime.invoke_endpoint(EndpointName=ENDPOINT_NAME,
ContentType='text/csv',
Body=payload)
result = json.loads(response['Body'].read().decode())
Expand Down
Loading

0 comments on commit 6860681

Please sign in to comment.