Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simple pipeline demo #322

Merged
merged 8 commits into from
Nov 16, 2018
Merged

Conversation

texasmichelle
Copy link
Member

@texasmichelle texasmichelle commented Nov 10, 2018

Very basic pipeline v0.1.2 with GPU autoprovisioning and hyperparameter tuning
Use pipelines v0.1.2


This change is Reviewable

Correct SDK syntax that labels the name of the pipeline step
@cwbeitel
Copy link
Contributor

@texasmichelle Excited to review this, looks really useful.

Copy link
Contributor

@cwbeitel cwbeitel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@texasmichelle This is nice work.

There are various things I could see improved but which probably fall outside the scope such as making interfaces concise and pythonic.

Also testing seems like an issue that continues to loom and we could start to break ground on that here or leave it for the future if we'd rather depend on kubeflow/pipelines#203.

@@ -0,0 +1,34 @@
#!/usr/bin/env python3

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a super convenient way to build pipelines!! It looks like this generates a tgz people upload to the Argo UI. Does this also generate the pipeline YAML in this directory? If not what is the relevance of the YAML that's included (perhaps as a comparison of the harder way of specifying a pipeline)? Guessing it's the former.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can upload the .tar.gz file directly, but in this case I included a yaml with resource requests for GPUs. Support for this via python is in the works by @qimingj.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep. Supporting to GPU is coming soon.

num_layers: kfp.PipelineParam = kfp.PipelineParam(name='numlayers', value='2'),
optimizer: kfp.PipelineParam = kfp.PipelineParam(name='optimizer', value='ftrl')):
training = training_op(learning_rate, num_layers, optimizer) # pylint: disable=unused-variable

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fact that this pipeline is specified in python would make this especially easy to unit test. Up to you whether that's part of this PR. But can the means of triggering the pipeline run given the output of this script be programmatic? Can we consume a status code for the resulting pipeline run?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

APIs for running pipelines are included - a good example is here, which @vicaire showed in this morning's community meeting.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@texasmichelle So would it be reasonable to use this mechanism to test the pipeline/example or should that be left for the future?


Notice the low accuracy.

## 3. Perform hyperparameter tuning
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels like the docs skip a step at this point - are people expected to have written the katib job spec manually (i.e. where is it coming from)? Would it be appropriate to have a kubeflow/pipelines op for launching a katib studyjob? I would find this really convenient. But perhaps beyond the current scope.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a link to the source of the gpu example file. I'm not sure how to apply a katib manifest using a pipeline step - @vicaire @qimingj do you know if that is supported?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not supported in pipeline DSL, but it is supported in argo yaml since argo supports any K8s template, not just container spec. We can add Katib to DSL support (such as a KatibOp). In order to do that, ideally Katib's CRD should return some output in its job status (available via kubectl get) so argo can pick it up as output (with a JSON path to the field), and then the job output can be passed to downstream components. We should discuss this.


Determine which combination of hyperparameters results in the highest accuracy.

## 4. Run a better pipeline
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe something like "next steps" for the section title and a little lead-in in the prose, e.g. "now that we've found some good hyperparameters we're ready to ..."

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a bit of text to clarify the point of the transition

pip install https://storage.googleapis.com/ml-pipeline/release/0.0.26/kfp-0.0.26.tar.gz --upgrade
```

## 2. Create a GKE cluster and install Kubeflow
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess there's a convenience to the user vs. maintainability tradeoff here. It's more convenient for the docs for launching kubeflow to be right here but it presents a maintainability challenge to have that documentation replicated in numerous places instead of being centralized. Thoughts?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wrestled with this question and finally settled on adding this here for now. Early intentions were to have a single demo_setup directory in the root dir of demos, but the problem is that it can grow large and is hard to maintain. I prefer having a smaller number of setup steps that exactly matches each demo, but that comes with maintenance challenges. It's not a straightforward call and I'm open to other approaches. Good unit test coverage is our best defense here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure it's your preference on that then.

Install kubeflow with the following commands:

```
kfctl init ${CLUSTER} --platform gcp
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious - why not just use kfctl to create the GKE cluster?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This demo highlights autoprovisioning, which is a beta feature not included in kfctl or click-to-deploy. It also includes pipelines, which needs a bit of work on access permissions in order to be included as part of kfctl.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The plan for 0.4.0 is to include pipelines by default in kubeflow deployments, so hopefully this will simplify the process.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't wait for that day 💃

kfctl apply k8s
```

Patch some outdated katib artifacts:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably fix this, instead of telling users to patch their clusters.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed! I'll add updates to PR #1904 & Issue #1903

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not block this PR waiting for a fix

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR is merged, do we still need this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we include those changes in an 0.3 patch? I would like to be able to specify a version.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing 💯 Thanks!!

Basically empty step just to show more than one step
@texasmichelle
Copy link
Member Author

Are there any outstanding items to address with this PR?

@cwbeitel
Copy link
Contributor

@texasmichelle If it looks good to you without a test for now then that's cool with me! But in that case we might add an issue to track that for a follow-on.

/lgtm

@texasmichelle
Copy link
Member Author

Yeah let's do that after pipelines-203 is resolved - thanks for tracking that. #337 created for adding tests.

@k8s-ci-robot k8s-ci-robot removed the lgtm label Nov 15, 2018
Remove katib patch
Use kubeflow v0.3.3
Add PROJECT to env var override file
Further clarification of instructions
@texasmichelle
Copy link
Member Author

OK, looking a bit better. Thanks to you both for help polishing!

@lluunn
Copy link
Contributor

lluunn commented Nov 16, 2018

/approve

Copy link

@qimingj qimingj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @texasmichelle!

return kfp.ContainerOp(
name=step_name,
image='katib/mxnet-mnist-example',
command=['python', '/mxnet/example/image-classification/train_mnist.py'],
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a more interesting thing we can do in postprocessing rather than just echo? For example, push the model for serving? copy the model to somewhere? running a batch prediction? Convert the model to tf? Of course we can expand the pipeline later.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want to invest more effort in this pipeline since it's not really what we want to be showing. I would rather use one of the better examples, but to do that we need katib support for tf-job, which @richardsliu is looking into. Pipeline DSL support for katib would round things out to turn this into a much smoother demo.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SG.


Notice the low accuracy.

## 3. Perform hyperparameter tuning
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not supported in pipeline DSL, but it is supported in argo yaml since argo supports any K8s template, not just container spec. We can add Katib to DSL support (such as a KatibOp). In order to do that, ideally Katib's CRD should return some output in its job status (available via kubectl get) so argo can pick it up as output (with a JSON path to the field), and then the job output can be passed to downstream components. We should discuss this.

- ftrl
workerSpec:
goTemplate:
templatePath: "/worker-template/gpuWorkerTemplate.yaml"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So how this Katib job knows which training job to run? Is it somehow referencing the pipeline job?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The katib component includes a configmap.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. It's not obvious from the file name (gpuWorkerTemplate.yaml) that the template references a mnist mxnet example.

@@ -0,0 +1,34 @@
#!/usr/bin/env python3

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep. Supporting to GPU is coming soon.

@jlewi
Copy link
Contributor

jlewi commented Nov 16, 2018

/lgtm
/approve
/hold because there are pending comments

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jlewi, lluunn

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot merged commit 4bbc0c8 into kubeflow:master Nov 16, 2018
@texasmichelle texasmichelle deleted the simple-pipeline branch November 16, 2018 19:57
yixinshi pushed a commit to yixinshi/examples that referenced this pull request Nov 30, 2018
* Add simple pipeline demo

* Add hyperparameter tuning & GPU autoprovisioning

Use pipelines v0.1.2

* Resolve lint issues

* Disable lint warning

Correct SDK syntax that labels the name of the pipeline step

* Add postprocessing step

Basically empty step just to show more than one step

* Add clarity to instructions

* Update pipelines install to release v0.1.2

* Add repo cloning with release versions

Remove katib patch
Use kubeflow v0.3.3
Add PROJECT to env var override file
Further clarification of instructions
Svendegroote91 pushed a commit to Svendegroote91/examples that referenced this pull request Dec 6, 2018
* Add simple pipeline demo

* Add hyperparameter tuning & GPU autoprovisioning

Use pipelines v0.1.2

* Resolve lint issues

* Disable lint warning

Correct SDK syntax that labels the name of the pipeline step

* Add postprocessing step

Basically empty step just to show more than one step

* Add clarity to instructions

* Update pipelines install to release v0.1.2

* Add repo cloning with release versions

Remove katib patch
Use kubeflow v0.3.3
Add PROJECT to env var override file
Further clarification of instructions
Svendegroote91 pushed a commit to Svendegroote91/examples that referenced this pull request Apr 1, 2019
* Add simple pipeline demo

* Add hyperparameter tuning & GPU autoprovisioning

Use pipelines v0.1.2

* Resolve lint issues

* Disable lint warning

Correct SDK syntax that labels the name of the pipeline step

* Add postprocessing step

Basically empty step just to show more than one step

* Add clarity to instructions

* Update pipelines install to release v0.1.2

* Add repo cloning with release versions

Remove katib patch
Use kubeflow v0.3.3
Add PROJECT to env var override file
Further clarification of instructions
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants