-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(components): AWS SageMaker - Add optional parameter to allow training component to accept parameters related to Debugger #4283
Conversation
…t tests, and integration test
Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). 📝 Please visit https://cla.developers.google.com/ to sign. Once you've signed (or fixed any issues), please reply here with What to do if you already signed the CLAIndividual signers
Corporate signers
ℹ️ Googlers: Go here for more info. |
Hi @dstnluong. Thanks for your PR. I'm waiting for a kubeflow member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/assign @RedbackThomson |
samples/contrib/aws-samples/debugger_component_demo/debugger-component-demo.py
Outdated
Show resolved
Hide resolved
samples/contrib/aws-samples/debugger_component_demo/debugger-component-demo.py
Outdated
Show resolved
Hide resolved
samples/contrib/aws-samples/debugger_component_demo/debugger-component-demo.py
Outdated
Show resolved
Hide resolved
samples/contrib/aws-samples/debugger_component_demo/debugger-component-demo.py
Outdated
Show resolved
Hide resolved
samples/contrib/aws-samples/debugger_component_demo/debugger-component-demo.py
Outdated
Show resolved
Hide resolved
…e more succinct, removed hardcoding from integration tests
Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). 📝 Please visit https://cla.developers.google.com/ to sign. Once you've signed (or fixed any issues), please reply here with What to do if you already signed the CLAIndividual signers
Corporate signers
ℹ️ Googlers: Go here for more info. |
samples/contrib/aws-samples/debugger_component_demo/debugger-component-demo.py
Outdated
Show resolved
Hide resolved
samples/contrib/aws-samples/debugger_component_demo/debugger-component-demo.py
Outdated
Show resolved
Hide resolved
…ple README, refactored _utils.py for fstrings and fixed offset for errors
Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). 📝 Please visit https://cla.developers.google.com/ to sign. Once you've signed (or fixed any issues), please reply here with What to do if you already signed the CLAIndividual signers
Corporate signers
ℹ️ Googlers: Go here for more info. |
We found a Contributor License Agreement for you (the sender of this pull request), but were unable to find agreements for all the commit author(s) or Co-authors. If you authored these, maybe you used a different email address in the git commits than was used to sign the CLA (login here to double check)? If these were authored by someone else, then they will need to sign a CLA as well, and confirm that they're okay with these being contributed to Google. ℹ️ Googlers: Go here for more info. |
We found a Contributor License Agreement for you (the sender of this pull request), but were unable to find agreements for all the commit author(s) or Co-authors. If you authored these, maybe you used a different email address in the git commits than was used to sign the CLA (login here to double check)? If these were authored by someone else, then they will need to sign a CLA as well, and confirm that they're okay with these being contributed to Google. ℹ️ Googlers: Go here for more info. |
15d1861
to
09664d3
Compare
@@ -8,6 +8,7 @@ | |||
import shutil | |||
|
|||
from sagemaker.amazon.amazon_estimator import get_image_uri |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
deprecated :
https://github.com/aws/sagemaker-python-sdk/blame/master/CHANGELOG.md#L38
use image_uris.retrieve()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That deprecated version is for SageMaker SDK 2.0 @akartsky
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you are right we can choose to keep using the old version
for some reason i thought this was added to _utils.py file as a new dependency so i thought of using the latest one https://github.com/kubeflow/pipelines/blob/master/components/aws/sagemaker/common/_utils.py#L39
ac4f2a5
to
8c95a04
Compare
a69c442
to
8c95a04
Compare
|
||
:output Trainingjob-1234567890-RuleName-a1112312 | ||
""" | ||
job_name = rule_evaluation_job_arn.split("/")[1].split("-") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rule_evaluation_job_arn.split("/")[1]
should be sufficient ?
also the prefix might not be always Trainingjob-
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We want to replace the initial "trainingjob" with "Trainingjob" and change the rule name as well. There is an edge case where the rule name contains the "-" character, just something to note.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this method is no longer needed
sagemaker==2.1.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@RedbackThomson can an improvement to reduce duplication be part of your PR ?
Dustin had to change sagemaker version at multiple places
components/aws/sagemaker/tests/integration_tests/resources/definition/training_pipeline.py
Outdated
Show resolved
Hide resolved
/ok-to-test |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: surajkota The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/retest |
/retry |
/retest |
/lgtm |
…ining component to accept parameters related to Debugger (kubeflow#4283) * Implemented debugger for training component with sample pipeline, unit tests, and integration test * Implemented changes from PR, refactored utils.py, made sample pipeline more succinct, removed hardcoding from integration tests * Added default parameter for sample pipeline and fixed grammar for sample README, refactored _utils.py for fstrings and fixed offset for errors * Removed aws secret lines * Terminate debug rules when terminating training job, Terminate debug rules if terminate is pressed after training job has completed, added integration tests for stop_debug_rules, updated READMEs for train and sample, renamed sample pipeline, removed tensorboard, updated sagemaker version to sagemaker 2.1.0. * Terminate debug rules when terminating training job, Terminate debug rules if terminate is pressed after training job has completed, added integration tests for stop_debug_rules, updated READMEs for train and sample, renamed sample pipeline, removed tensorboard, updated sagemaker version to sagemaker 2.1.0. * Removed extra files, cleaned integration test * Changed integration test to use sample debugger pipeline * Processing jobs created from debug rules will not terminate, fixing other small issues * Removed debug from pipeline definition, removed extra line, removed unused function * Changelog and image tag updates
Add functionality to accept and process debugger related parameters
Updated KFP logs to show debugger rules status and to wait until all rules have finished
Added and updated unit tests and integration tests
Added a new sample pipeline which uses the newly added parameters with a README that explains how to run the sample
Minor typo fixes in various files
Checklist:
The title for your pull request (PR) should follow our title convention. Learn more about the pull request title convention used in this repository.
PR titles examples:
fix(frontend): fixes empty page. Fixes #1234
Use
fix
to indicate that this PR fixes a bug.feat(backend): configurable service account. Fixes #1234, fixes #1235
Use
feat
to indicate that this PR adds a new feature.chore: set up changelog generation tools
Use
chore
to indicate that this PR makes some changes that users don't need to know.test: fix CI failure. Part of #1234
Use
part of
to indicate that a PR is working on an issue, but shouldn't close the issue when merged.Do you want this pull request (PR) cherry-picked into the current release branch?
If yes, use one of the following options:
cherrypick-approved
label to this PR. The release manager adds this PR to the release branch in a batch update.