-
Notifications
You must be signed in to change notification settings - Fork 756
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simple pipeline demo #322
Simple pipeline demo #322
Conversation
Use pipelines v0.1.2
Correct SDK syntax that labels the name of the pipeline step
@texasmichelle Excited to review this, looks really useful. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@texasmichelle This is nice work.
There are various things I could see improved but which probably fall outside the scope such as making interfaces concise and pythonic.
Also testing seems like an issue that continues to loom and we could start to break ground on that here or leave it for the future if we'd rather depend on kubeflow/pipelines#203.
@@ -0,0 +1,34 @@ | |||
#!/usr/bin/env python3 | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a super convenient way to build pipelines!! It looks like this generates a tgz people upload to the Argo UI. Does this also generate the pipeline YAML in this directory? If not what is the relevance of the YAML that's included (perhaps as a comparison of the harder way of specifying a pipeline)? Guessing it's the former.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can upload the .tar.gz file directly, but in this case I included a yaml with resource requests for GPUs. Support for this via python is in the works by @qimingj.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep. Supporting to GPU is coming soon.
num_layers: kfp.PipelineParam = kfp.PipelineParam(name='numlayers', value='2'), | ||
optimizer: kfp.PipelineParam = kfp.PipelineParam(name='optimizer', value='ftrl')): | ||
training = training_op(learning_rate, num_layers, optimizer) # pylint: disable=unused-variable | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fact that this pipeline is specified in python would make this especially easy to unit test. Up to you whether that's part of this PR. But can the means of triggering the pipeline run given the output of this script be programmatic? Can we consume a status code for the resulting pipeline run?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@texasmichelle So would it be reasonable to use this mechanism to test the pipeline/example or should that be left for the future?
|
||
Notice the low accuracy. | ||
|
||
## 3. Perform hyperparameter tuning |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels like the docs skip a step at this point - are people expected to have written the katib job spec manually (i.e. where is it coming from)? Would it be appropriate to have a kubeflow/pipelines op for launching a katib studyjob? I would find this really convenient. But perhaps beyond the current scope.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not supported in pipeline DSL, but it is supported in argo yaml since argo supports any K8s template, not just container spec. We can add Katib to DSL support (such as a KatibOp). In order to do that, ideally Katib's CRD should return some output in its job status (available via kubectl get) so argo can pick it up as output (with a JSON path to the field), and then the job output can be passed to downstream components. We should discuss this.
|
||
Determine which combination of hyperparameters results in the highest accuracy. | ||
|
||
## 4. Run a better pipeline |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe something like "next steps" for the section title and a little lead-in in the prose, e.g. "now that we've found some good hyperparameters we're ready to ..."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a bit of text to clarify the point of the transition
pip install https://storage.googleapis.com/ml-pipeline/release/0.0.26/kfp-0.0.26.tar.gz --upgrade | ||
``` | ||
|
||
## 2. Create a GKE cluster and install Kubeflow |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess there's a convenience to the user vs. maintainability tradeoff here. It's more convenient for the docs for launching kubeflow to be right here but it presents a maintainability challenge to have that documentation replicated in numerous places instead of being centralized. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wrestled with this question and finally settled on adding this here for now. Early intentions were to have a single demo_setup directory in the root dir of demos, but the problem is that it can grow large and is hard to maintain. I prefer having a smaller number of setup steps that exactly matches each demo, but that comes with maintenance challenges. It's not a straightforward call and I'm open to other approaches. Good unit test coverage is our best defense here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure it's your preference on that then.
Install kubeflow with the following commands: | ||
|
||
``` | ||
kfctl init ${CLUSTER} --platform gcp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious - why not just use kfctl to create the GKE cluster?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This demo highlights autoprovisioning, which is a beta feature not included in kfctl or click-to-deploy. It also includes pipelines, which needs a bit of work on access permissions in order to be included as part of kfctl.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The plan for 0.4.0 is to include pipelines by default in kubeflow deployments, so hopefully this will simplify the process.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can't wait for that day 💃
kfctl apply k8s | ||
``` | ||
|
||
Patch some outdated katib artifacts: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably fix this, instead of telling users to patch their clusters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed! I'll add updates to PR #1904 & Issue #1903
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's not block this PR waiting for a fix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR is merged, do we still need this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we include those changes in an 0.3 patch? I would like to be able to specify a version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amazing 💯 Thanks!!
Basically empty step just to show more than one step
Are there any outstanding items to address with this PR? |
@texasmichelle If it looks good to you without a test for now then that's cool with me! But in that case we might add an issue to track that for a follow-on. /lgtm |
Yeah let's do that after pipelines-203 is resolved - thanks for tracking that. #337 created for adding tests. |
Remove katib patch Use kubeflow v0.3.3 Add PROJECT to env var override file Further clarification of instructions
OK, looking a bit better. Thanks to you both for help polishing! |
/approve |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @texasmichelle!
return kfp.ContainerOp( | ||
name=step_name, | ||
image='katib/mxnet-mnist-example', | ||
command=['python', '/mxnet/example/image-classification/train_mnist.py'], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a more interesting thing we can do in postprocessing rather than just echo? For example, push the model for serving? copy the model to somewhere? running a batch prediction? Convert the model to tf? Of course we can expand the pipeline later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't want to invest more effort in this pipeline since it's not really what we want to be showing. I would rather use one of the better examples, but to do that we need katib support for tf-job, which @richardsliu is looking into. Pipeline DSL support for katib would round things out to turn this into a much smoother demo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SG.
|
||
Notice the low accuracy. | ||
|
||
## 3. Perform hyperparameter tuning |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not supported in pipeline DSL, but it is supported in argo yaml since argo supports any K8s template, not just container spec. We can add Katib to DSL support (such as a KatibOp). In order to do that, ideally Katib's CRD should return some output in its job status (available via kubectl get) so argo can pick it up as output (with a JSON path to the field), and then the job output can be passed to downstream components. We should discuss this.
- ftrl | ||
workerSpec: | ||
goTemplate: | ||
templatePath: "/worker-template/gpuWorkerTemplate.yaml" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So how this Katib job knows which training job to run? Is it somehow referencing the pipeline job?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The katib component includes a configmap.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. It's not obvious from the file name (gpuWorkerTemplate.yaml) that the template references a mnist mxnet example.
@@ -0,0 +1,34 @@ | |||
#!/usr/bin/env python3 | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep. Supporting to GPU is coming soon.
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: jlewi, lluunn The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
* Add simple pipeline demo * Add hyperparameter tuning & GPU autoprovisioning Use pipelines v0.1.2 * Resolve lint issues * Disable lint warning Correct SDK syntax that labels the name of the pipeline step * Add postprocessing step Basically empty step just to show more than one step * Add clarity to instructions * Update pipelines install to release v0.1.2 * Add repo cloning with release versions Remove katib patch Use kubeflow v0.3.3 Add PROJECT to env var override file Further clarification of instructions
* Add simple pipeline demo * Add hyperparameter tuning & GPU autoprovisioning Use pipelines v0.1.2 * Resolve lint issues * Disable lint warning Correct SDK syntax that labels the name of the pipeline step * Add postprocessing step Basically empty step just to show more than one step * Add clarity to instructions * Update pipelines install to release v0.1.2 * Add repo cloning with release versions Remove katib patch Use kubeflow v0.3.3 Add PROJECT to env var override file Further clarification of instructions
* Add simple pipeline demo * Add hyperparameter tuning & GPU autoprovisioning Use pipelines v0.1.2 * Resolve lint issues * Disable lint warning Correct SDK syntax that labels the name of the pipeline step * Add postprocessing step Basically empty step just to show more than one step * Add clarity to instructions * Update pipelines install to release v0.1.2 * Add repo cloning with release versions Remove katib patch Use kubeflow v0.3.3 Add PROJECT to env var override file Further clarification of instructions
Very basic pipeline v0.1.2 with GPU autoprovisioning and hyperparameter tuning
Use pipelines v0.1.2
This change is