Simple pipeline demo #322

texasmichelle · 2018-11-10T00:42:52Z

Very basic pipeline v0.1.2 with GPU autoprovisioning and hyperparameter tuning
Use pipelines v0.1.2

This change is

Use pipelines v0.1.2

Correct SDK syntax that labels the name of the pipeline step

cwbeitel · 2018-11-10T19:25:40Z

@texasmichelle Excited to review this, looks really useful.

cwbeitel

@texasmichelle This is nice work.

There are various things I could see improved but which probably fall outside the scope such as making interfaces concise and pythonic.

Also testing seems like an issue that continues to loom and we could start to break ground on that here or leave it for the future if we'd rather depend on kubeflow/pipelines#203.

cwbeitel · 2018-11-11T19:15:24Z

demos/simple_pipeline/gpu-example-pipeline.py

@@ -0,0 +1,34 @@
+#!/usr/bin/env python3
+


This looks like a super convenient way to build pipelines!! It looks like this generates a tgz people upload to the Argo UI. Does this also generate the pipeline YAML in this directory? If not what is the relevance of the YAML that's included (perhaps as a comparison of the harder way of specifying a pipeline)? Guessing it's the former.

You can upload the .tar.gz file directly, but in this case I included a yaml with resource requests for GPUs. Support for this via python is in the works by @qimingj.

yep. Supporting to GPU is coming soon.

cwbeitel · 2018-11-11T19:28:30Z

demos/simple_pipeline/gpu-example-pipeline.py

+  num_layers: kfp.PipelineParam = kfp.PipelineParam(name='numlayers', value='2'),
+  optimizer: kfp.PipelineParam = kfp.PipelineParam(name='optimizer', value='ftrl')):
+  training = training_op(learning_rate, num_layers, optimizer) # pylint: disable=unused-variable
+


The fact that this pipeline is specified in python would make this especially easy to unit test. Up to you whether that's part of this PR. But can the means of triggering the pipeline run given the output of this script be programmatic? Can we consume a status code for the resulting pipeline run?

APIs for running pipelines are included - a good example is here, which @vicaire showed in this morning's community meeting.

@texasmichelle So would it be reasonable to use this mechanism to test the pipeline/example or should that be left for the future?

cwbeitel · 2018-11-11T19:42:49Z

demos/simple_pipeline/README.md

+
+Notice the low accuracy.
+
+## 3. Perform hyperparameter tuning


It feels like the docs skip a step at this point - are people expected to have written the katib job spec manually (i.e. where is it coming from)? Would it be appropriate to have a kubeflow/pipelines op for launching a katib studyjob? I would find this really convenient. But perhaps beyond the current scope.

Added a link to the source of the gpu example file. I'm not sure how to apply a katib manifest using a pipeline step - @vicaire @qimingj do you know if that is supported?

It is not supported in pipeline DSL, but it is supported in argo yaml since argo supports any K8s template, not just container spec. We can add Katib to DSL support (such as a KatibOp). In order to do that, ideally Katib's CRD should return some output in its job status (available via kubectl get) so argo can pick it up as output (with a JSON path to the field), and then the job output can be passed to downstream components. We should discuss this.

cwbeitel · 2018-11-11T20:30:04Z

demos/simple_pipeline/README.md

+
+Determine which combination of hyperparameters results in the highest accuracy.
+
+## 4. Run a better pipeline


Maybe something like "next steps" for the section title and a little lead-in in the prose, e.g. "now that we've found some good hyperparameters we're ready to ..."

Added a bit of text to clarify the point of the transition

cwbeitel · 2018-11-11T20:32:31Z

demos/simple_pipeline/demo_setup/README.md

+pip install https://storage.googleapis.com/ml-pipeline/release/0.0.26/kfp-0.0.26.tar.gz --upgrade
+```
+
+## 2. Create a GKE cluster and install Kubeflow


I guess there's a convenience to the user vs. maintainability tradeoff here. It's more convenient for the docs for launching kubeflow to be right here but it presents a maintainability challenge to have that documentation replicated in numerous places instead of being centralized. Thoughts?

I wrestled with this question and finally settled on adding this here for now. Early intentions were to have a single demo_setup directory in the root dir of demos, but the problem is that it can grow large and is hard to maintain. I prefer having a smaller number of setup steps that exactly matches each demo, but that comes with maintenance challenges. It's not a straightforward call and I'm open to other approaches. Good unit test coverage is our best defense here.

Sure it's your preference on that then.

richardsliu · 2018-11-13T01:21:06Z

demos/simple_pipeline/demo_setup/README.md

+Install kubeflow with the following commands:
+
+```
+kfctl init ${CLUSTER} --platform gcp


Just curious - why not just use kfctl to create the GKE cluster?

This demo highlights autoprovisioning, which is a beta feature not included in kfctl or click-to-deploy. It also includes pipelines, which needs a bit of work on access permissions in order to be included as part of kfctl.

The plan for 0.4.0 is to include pipelines by default in kubeflow deployments, so hopefully this will simplify the process.

can't wait for that day 💃

richardsliu · 2018-11-13T01:21:27Z

demos/simple_pipeline/demo_setup/README.md

+kfctl apply k8s
+```
+
+Patch some outdated katib artifacts:


We should probably fix this, instead of telling users to patch their clusters.

Agreed! I'll add updates to PR #1904 & Issue #1903

Let's not block this PR waiting for a fix

The PR is merged, do we still need this?

Can we include those changes in an 0.3 patch? I would like to be able to specify a version.

Done. https://github.com/kubeflow/kubeflow/releases/tag/v0.3.3

Amazing 💯 Thanks!!

Basically empty step just to show more than one step

texasmichelle · 2018-11-15T17:14:06Z

Are there any outstanding items to address with this PR?

cwbeitel · 2018-11-15T17:47:47Z

@texasmichelle If it looks good to you without a test for now then that's cool with me! But in that case we might add an issue to track that for a follow-on.

/lgtm

texasmichelle · 2018-11-15T18:09:20Z

Yeah let's do that after pipelines-203 is resolved - thanks for tracking that. #337 created for adding tests.

Remove katib patch Use kubeflow v0.3.3 Add PROJECT to env var override file Further clarification of instructions

texasmichelle · 2018-11-16T00:46:38Z

OK, looking a bit better. Thanks to you both for help polishing!

lluunn · 2018-11-16T17:38:55Z

/approve

qimingj

Thanks @texasmichelle!

qimingj · 2018-11-16T01:00:56Z

demos/simple_pipeline/gpu-example-pipeline.py

+  return kfp.ContainerOp(
+    name=step_name,
+    image='katib/mxnet-mnist-example',
+    command=['python', '/mxnet/example/image-classification/train_mnist.py'],


Is there a more interesting thing we can do in postprocessing rather than just echo? For example, push the model for serving? copy the model to somewhere? running a batch prediction? Convert the model to tf? Of course we can expand the pipeline later.

I don't want to invest more effort in this pipeline since it's not really what we want to be showing. I would rather use one of the better examples, but to do that we need katib support for tf-job, which @richardsliu is looking into. Pipeline DSL support for katib would round things out to turn this into a much smoother demo.

qimingj · 2018-11-16T17:38:25Z

demos/simple_pipeline/README.md

+
+Notice the low accuracy.
+
+## 3. Perform hyperparameter tuning


It is not supported in pipeline DSL, but it is supported in argo yaml since argo supports any K8s template, not just container spec. We can add Katib to DSL support (such as a KatibOp). In order to do that, ideally Katib's CRD should return some output in its job status (available via kubectl get) so argo can pick it up as output (with a JSON path to the field), and then the job output can be passed to downstream components. We should discuss this.

qimingj · 2018-11-16T17:39:44Z

demos/simple_pipeline/gpu-example-katib.yaml

+        - ftrl
+  workerSpec:
+    goTemplate:
+        templatePath: "/worker-template/gpuWorkerTemplate.yaml"


So how this Katib job knows which training job to run? Is it somehow referencing the pipeline job?

The katib component includes a configmap.

I see. It's not obvious from the file name (gpuWorkerTemplate.yaml) that the template references a mnist mxnet example.

qimingj · 2018-11-16T17:40:05Z

demos/simple_pipeline/gpu-example-pipeline.py

@@ -0,0 +1,34 @@
+#!/usr/bin/env python3
+


yep. Supporting to GPU is coming soon.

jlewi · 2018-11-16T19:13:28Z

/lgtm
/approve
/hold because there are pending comments

k8s-ci-robot · 2018-11-16T19:13:32Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jlewi, lluunn

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [jlewi,lluunn]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

* Add simple pipeline demo * Add hyperparameter tuning & GPU autoprovisioning Use pipelines v0.1.2 * Resolve lint issues * Disable lint warning Correct SDK syntax that labels the name of the pipeline step * Add postprocessing step Basically empty step just to show more than one step * Add clarity to instructions * Update pipelines install to release v0.1.2 * Add repo cloning with release versions Remove katib patch Use kubeflow v0.3.3 Add PROJECT to env var override file Further clarification of instructions

texasmichelle and others added 2 commits October 31, 2018 20:42

Add simple pipeline demo

4be1a9a

Add hyperparameter tuning & GPU autoprovisioning

a164324

Use pipelines v0.1.2

k8s-ci-robot requested review from cwbeitel and gaocegege November 10, 2018 00:42

k8s-ci-robot added the size/XL label Nov 10, 2018

texasmichelle added 2 commits November 9, 2018 17:24

Resolve lint issues

1be82bf

Disable lint warning

8622fb0

Correct SDK syntax that labels the name of the pipeline step

This was referenced Nov 11, 2018

Make it easy for people to write pipeline tests in python kubeflow/pipelines#203

Closed

Provide python or CL interface to generate StudyJob yaml and/or StudyJob kubeflow/katib#240

Closed

Support for tensor2tensor ranged_hparams kubeflow/katib#241

Open

cwbeitel reviewed Nov 11, 2018

View reviewed changes

richardsliu reviewed Nov 13, 2018

View reviewed changes

texasmichelle added 2 commits November 12, 2018 22:42

Add postprocessing step

a9cd7a3

Basically empty step just to show more than one step

Add clarity to instructions

6bd7b7e

k8s-ci-robot assigned cwbeitel Nov 15, 2018

k8s-ci-robot added the lgtm label Nov 15, 2018

Update pipelines install to release v0.1.2

7f76ea9

k8s-ci-robot removed the lgtm label Nov 15, 2018

Add repo cloning with release versions

b9f4cea

Remove katib patch Use kubeflow v0.3.3 Add PROJECT to env var override file Further clarification of instructions

k8s-ci-robot added the approved label Nov 16, 2018

qimingj reviewed Nov 16, 2018

View reviewed changes

k8s-ci-robot assigned jlewi Nov 16, 2018

k8s-ci-robot added the lgtm label Nov 16, 2018

k8s-ci-robot merged commit 4bbc0c8 into kubeflow:master Nov 16, 2018

texasmichelle deleted the simple-pipeline branch November 16, 2018 19:57


		Determine which combination of hyperparameters results in the highest accuracy.

		## 4. Run a better pipeline

Simple pipeline demo #322

Simple pipeline demo #322

Uh oh!

Conversation

texasmichelle commented Nov 10, 2018 • edited by jlewi Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cwbeitel commented Nov 10, 2018

Uh oh!

cwbeitel left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

texasmichelle commented Nov 15, 2018

Uh oh!

cwbeitel commented Nov 15, 2018

Uh oh!

texasmichelle commented Nov 15, 2018

Uh oh!

texasmichelle commented Nov 16, 2018

Uh oh!

lluunn commented Nov 16, 2018

Uh oh!

qimingj left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

texasmichelle commented Nov 10, 2018 •

edited by jlewi

Loading