Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update SageMaker components and sample pipeline #1682

Merged
merged 7 commits into from
Jul 30, 2019

Conversation

carolynwang
Copy link
Contributor

@carolynwang carolynwang commented Jul 26, 2019

Summary

@Jeffwan @mbaijal @gautamkmr @kalyc

Added/Updated Dockerfiles for individual components

  • HPO, train, and batch transform
  • Note: not actually being used for the components

Updates to HPO component

  • Make job name optional
  • Add link to CloudWatch logs in pipeline logs
  • Add 'training_image' output, which can be used to pipeline with the create model component
  • Update ReadMe

Updates to train, create model, deploy, and batch transform components

  • Expose all options to the user
  • Convert job request creation to use template for all jobs except create endpoint
  • Update wait for training job to be consistent with other components
  • Note: SageMaker allows an unlimited number of models per endpoint, but the deploy component only allows a maximum of 3 models. This was done to preserve the ability for users to use the output of the create model component as the input for the deploy component

Add HPO and new parameters to MNIST classification sample pipeline

  • Show sample usage of all components via an MNIST classification pipeline
  • Update sample pipeline ReadMe
  • Note: the default dataset input used in the train component is not actually practical and is just being used to demonstrate that the component works

Testing

Updated components functionality

  • mnist-classification-pipeline.py compiles and runs without errors using the default values set in the pipeline parameters
  • Note: HPO and train components have been tested with BYOC and the K-Means built-in algorithm

This change is Reviewable

@k8s-ci-robot
Copy link
Contributor

Hi @carolynwang. Thanks for your PR.

I'm waiting for a kubeflow member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@carolynwang carolynwang changed the title Sagemaker components Update SageMaker HPO, train, and batch transform components Jul 26, 2019
@carolynwang carolynwang changed the title Update SageMaker HPO, train, and batch transform components Update SageMaker HPO, train, and batch transform components and sample pipeline Jul 26, 2019
@Ark-kun
Copy link
Contributor

Ark-kun commented Jul 26, 2019

/ok-to-test

components/aws/sagemaker/batch_transform/Dockerfile Outdated Show resolved Hide resolved
parser.add_argument('--input_filter', type=str, required=False, help='A JSONPath expression used to select a portion of the input data to pass to the algorithm.', default='')
parser.add_argument('--output_filter', type=str, required=False, help='A JSONPath expression used to select a portion of the joined dataset to save in the output file for a batch transform job.', default='')
parser.add_argument('--join_source', choices=['None', 'Input', ''], required=False, help='Specifies the source of the data to join with the transformed data.', default='None')
parser.add_argument('--instance_type', choices=['ml.m4.xlarge', 'ml.m4.2xlarge', 'ml.m4.4xlarge', 'ml.m4.10xlarge', 'ml.m4.16xlarge', 'ml.m5.large', 'ml.m5.xlarge', 'ml.m5.2xlarge', 'ml.m5.4xlarge',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: We need to consider how to better maintain this list. It could easily out of sync once SM rollout new instances

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to fetch supported instances using sagemaker api? like the way you get built-in algorithm?

Copy link
Contributor Author

@carolynwang carolynwang Jul 26, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There isnt a quick search through the sagemaker documentation

request['AlgorithmSpecification']['TrainingImage'] = args['image']
request['AlgorithmSpecification'].pop('AlgorithmName')
else:
# TODO: determine if users can make custom algorithm resources that have the same name as built-in algorithm names
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is custom algorithm supported here? Ideally, user should use BYOC or first party algorithms. No more options should be exposed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Custom algorithm is referring to algorithms that users create as an algorithm resource in SageMaker. The API has an AlgorithmName param for these custom algos and algos from AWS marketplace

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. Have you tried to create a custom algorithm using exist first-party algorithm name? I do think their API object is different or same name can not be used.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just tried it, created an algo with the name 'K-Means' and it worked so I need to adjust this implementation

components/aws/sagemaker/common/_utils.py Show resolved Hide resolved
components/aws/sagemaker/common/_utils.py Show resolved Hide resolved
components/aws/sagemaker/train/Dockerfile Outdated Show resolved Hide resolved
components/aws/sagemaker/train/Dockerfile Outdated Show resolved Hide resolved
@Jeffwan
Copy link
Member

Jeffwan commented Jul 26, 2019

/cc @Jeffwan

@carolynwang carolynwang changed the title Update SageMaker HPO, train, and batch transform components and sample pipeline Update SageMaker components and sample pipeline Jul 29, 2019
@Jeffwan
Copy link
Member

Jeffwan commented Jul 30, 2019

/retest

@Jeffwan
Copy link
Member

Jeffwan commented Jul 30, 2019

/lgtm
/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Jeffwan

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot merged commit 81b0f08 into kubeflow:master Jul 30, 2019
@carolynwang carolynwang deleted the sagemaker-components branch July 30, 2019 23:28
magdalenakuhn17 pushed a commit to magdalenakuhn17/pipelines that referenced this pull request Oct 22, 2023
See the rationale at https://github.com/hadolint/hadolint/wiki/DL3042

In addition to space saving, it also ensures that pip doesn't install
from local cache (especially in case of a rebuild) and reduces the
chance of creating bugs during development.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants