Skip to content

Commit

Permalink
Moves docs from the /samples README to the wiki (#84)
Browse files Browse the repository at this point in the history
* Moves docs from the /samples README to the wiki

I've copied all the info from the googleprivate/samples README.md to the wiki. This PR removes most of the content from the README, adds a simple introduction, and points people to the wiki docs for further information.

Relevant wiki files:
https://github.com/kubeflow/pipelines/wiki/Deploy-the-Kubeflow-Pipelines-Service
https://github.com/kubeflow/pipelines/wiki/Build-a-Pipeline
https://github.com/kubeflow/pipelines/wiki/Build-Your-Own-Component

* SDK/Tests/Components - Improve temporary file handling (#37)

* Improve temporary file handling in python op tests

* More small temp path fixes

* Fix tfx name bug in the tfma sample test (#67)

* fix tfx name bug

* update release build for the data publish

* SDK/DSL/Compiler - Fixed compilation of dsl.Condition (#28)

* Fixed compilation of dsl.Conditional
The compiler no longer produced intermediate steps.

* Got rid of _create_new_groups

* Changed the sub_group.type check

* Fix tfx name bug in the tfma sample test (#67)

* fix tfx name bug

* update release build for the data publish

* Build Python SDK in the releasing (#70)

* publish python sdk in the cloud build

* update cloudbuild

* adjust output path

* add dependency for buildapiserver

* mlp -> kfp.dsl (#88)

* Removed instruction to clone the repo.

* Changed link to samples doc on wiki.

* Add %%docker magic to jupyter kernel. It helps submitting a docker build job more easily with one cell. (#72)

* fix miscellaneous List API issue (#90)

* fix

* Update job_store.go

* Update run_store.go

* Update presubmit-tests.sh

* Update job_store.go

* Moves docs from pipelines main README to wiki (#83)

I've copied all the info from the kubeflow/pipelines README.md to the wiki. This PR removes most of the content from the README, adds a simple introduction, and points people to the wiki docs for further information.

Relevant wiki files:
https://github.com/kubeflow/pipelines/wiki
https://github.com/kubeflow/pipelines/wiki/Deploy-the-Kubeflow-Pipelines-Service
https://github.com/kubeflow/pipelines/wiki/Build-a-Pipeline

* SDK/DSL/Compiler - Reverted fix of dsl.Condition until the UI is ready. (#94)

* sort by run display name by default (#96)

* CSS changes for nav menu and tables (#99)

* debug tfma failure (#91)

* debug tfma failure

* tft version bug

* minor fix

* comment the test validation

* Fix validation check for maximum size limit (#104)

* Fixed the Minikube tests after moving to the new repo (#98)

* Don't barf when experiment name is already used (#101)

* don't barf when experiment name is already used

* make mock backend use case insensitive names

* ExperimentList tests, use immer.js (#86)

* experiment list tests

* use produce in more places

* remove extra test

* remove extra import

* Use the experiment's resource reference in the listJobs request (#105)

* use the experiment's resource reference in the listJobs request

* fix test import

* Add Ning and Alexey to OWNERS for components, samples and sample-test (#102)

* Compile samples instead of hard code them in API server (#76)

* compile samples

* update logging

* update description

* update sample

* add immediate value sample

* revert

* fail fast if the samples are failed to load

* comment

* address comments

* comment out

* update command

* comments

* Account for padding in metric progress fill (#107)

* Account for padding in metric progress fill

* small mock backend fix

* move to css classes, add color

* changes to breadcrumb style

* increase width of summary card

* tests

* merge tests

* First integration test for the ML Pipeline CLI (Pipeline List). (#81)

* First integration test for the ML Pipeline CLI (Pipeline List).

* Fixing an issue with an undefined variable

* Adding the --debug flag to help with debugging.

* Changing the namespace to Kubeflow.

* Add tests for the NewExperiment page (#109)

* Add tests for the NewExperiment page

* Fix test name

* Remove obsolete tests

* Clean up

* add xgboost: migrate from the old repo (#46)

* migrate from the old repo

* fix bug: accidentally override tfma test

* add tfma test back

* add tfma back

* typo fix

* fix small typo

* if job fails, exit after logs are output

* Remove CMLE sample for now since we are waiting for a service fix to support TPU. (#113)

* image tag update for release (#114)

* update image tag for new releases

* add more

* delete the accidentaly added sample

* fix typo (#116)

* Fix an issue that %%docker doesn't work. (#119)

* updated favicon to monochrome color (#118)

* Expanded row changes (#120)

* updated favicon to monochrome color

* simple CSS changes to expanded row

* Removed mentions of ark7 in tests (#111)

* Add basic sample tests (#79)

* add sequential sample test

* add condition basic sample

* reuse script

* add all the other basic tests

* update sample test dockerfile to add run_basic_test file

* write test output

* typo bug
  • Loading branch information
sarahmaddox authored and k8s-ci-robot committed Nov 7, 2018
1 parent 8427e30 commit 63e9c54
Showing 1 changed file with 3 additions and 158 deletions.
161 changes: 3 additions & 158 deletions samples/README.md
Original file line number Diff line number Diff line change
@@ -1,158 +1,3 @@
# ML Pipeline Services - Authoring Guideline

## Setup
* Create a python3 envionronment.

**Python 3.5 or above is required** and if you don't have Python3 set up, we suggest the following steps
to install [Miniconda](https://conda.io/miniconda.html).

In Debian/Ubuntu/[Cloud shell](https://console.cloud.google.com/cloudshell) environment:
```bash
apt-get update; apt-get install -y wget bzip2
wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh
```
In Windows environment, download the [installer](https://repo.continuum.io/miniconda/Miniconda3-latest-Windows-x86_64.exe) and
remember to select "*Add Miniconda to my PATH environment variable*" option during the installation.

In Mac environment, download the [installer](https://repo.continuum.io/miniconda/Miniconda3-latest-MacOSX-x86_64.sh) and
run the following command:

```bash
bash Miniconda3-latest-MacOSX-x86_64.sh
```

Then, create a clean python3 envionrment

```bash
conda create --name mlpipeline python=3.6
source activate mlpipeline
```

If `conda` command is not found, be sure to add the Miniconda path:

```bash
export PATH=MINICONDA_PATH/bin:$PATH

```

* Clone the repo.

* Install DSL library and DSL compiler

```bash
pip install https://storage.googleapis.com/ml-pipeline/release/0.0.26/kfp-0.0.26.tar.gz --upgrade
```
After successful installation the command "dsl-compile" should be added to your PATH.

## Compile the samples
The sample pipelines are represented as Python code. To run these samples, you need to compile them, and then upload the output to the Pipeline system from web UI.
<!---
In the future, we will build the compiler into the pipeline system such that these python files are immediately deployable.
--->

```bash
dsl-compile --py [path/to/py/file] --output [path/to/output/tar.gz]
```

For example:

```bash
dsl-compile --py [ML_REPO_DIRECTORY]/samples/basic/sequential.py --output [ML_REPO_DIRECTORY]/samples/basic/sequential.tar.gz
```

## Deploy the samples
Upload the generated .tar.gz file through the ML pipeline UI.

## Optional for advanced users: Building Your Own Components

### Requirement
Install [docker](https://www.docker.com/get-docker).

### Step One: Create A Container For Each Component
In most cases, you need to create your own container image that includes your program. You can find container
building examples from [here](https://github.com/kubeflow/pipelines/blob/master/components)(in the directory, go to any subdirectory and then go to “containers” directory).

If your component creates some outputs to be fed as inputs to the downstream components, each output has
to be a string and needs to be written to a separate local text file by the container image.
For example, if a trainer component needs to output the trained model path, it writes the path into a
local file “/output.txt”. In the python class (in step three), you have the chance to specify how to map the content
of local files to component outputs.

<!---[TODO]: Add how to produce UI metadata.--->

### Step Two: Create A Python Class For Your Component
The python classes describe the interactions with the docker container image created in step one.
For example, a component to create confusion matrix data from prediction results is like:

```python
class ConfusionMatrixOp(kfp.dsl.ContainerOp):
def __init__(self, name, predictions, output_path):
super(ConfusionMatrixOp, self).__init__(
name=name,
image='gcr.io/project-id/ml-pipeline-local-confusion-matrix:v1',
command=['python', '/ml/confusion_matrix.py'],
arguments=[
'--output', '%s/{{workflow.name}}/confusionmatrix' % output_path,
'--predictions', predictions
],
file_outputs={'label': '/output.txt'})
```

Note:
* Each component needs to inherit from kfp.dsl.ContainerOp.
* If you already defined ENTRYPOINT in the container image, you don’t have to provide “command” unless you want to override it.
* In the init arguments, there can be python native types (such as str, int) and “kfp.dsl.PipelineParam”
types. Each kfp.dsl.PipelineParam represents a parameter whose value is usually only known at run time. It might be a pipeline
parameter whose value is provided at pipeline run time by user, or can be an output from an upstream component.
In the above case, “predictions” and “output_path” are kfp.dsl.PipelineParams.
* Although value of each PipelineParam is only available at runtime, you can still use them inline the
argument (note the “%s”). It means at run time the argument will contain the value of the param inline.
* “File_outputs” lists a map between labels and local file paths. In the above case, the content of '/output.txt' is gathered as a string output of the operator. To reference the output in code:

```python
op = ConfusionMatrixOp(...)
op.outputs['label']
```

If there is only one output then you can also do “op.output”.


### Step Three: Create Your Workflow as a python function
Each pipeline is identified as a python function. For example:

```python
@kfp.dsl.pipeline(
name='TFX Trainer',
description='A trainer that does end-to-end training for TFX models.'
)
def train(
output_path,
train_data=kfp.dsl.PipelineParam('train-data',
value='gs://ml-pipeline-playground/tfx/taxi-cab-classification/train.csv'),
eval_data=kfp.dsl.PipelineParam('eval-data',
value='gs://ml-pipeline-playground/tfx/taxi-cab-classification/eval.csv'),
schema=kfp.dsl.PipelineParam('schema',
value='gs://ml-pipeline-playground/tfx/taxi-cab-classification/schema.json'),
target=kfp.dsl.PipelineParam('target', value='tips'),
learning_rate=kfp.dsl.PipelineParam('learning-rate', value=0.1),
hidden_layer_size=kfp.dsl.PipelineParam('hidden-layer-size', value='100,50'),
steps=kfp.dsl.PipelineParam('steps', value=1000),
slice_columns=kfp.dsl.PipelineParam('slice-columns', value='trip_start_hour'),
true_class=kfp.dsl.PipelineParam('true-class', value='true'),
need_analysis=kfp.dsl.PipelineParam('need-analysis', value='true'),
)
```

Note:

* **@kfp.dsl.pipeline** is a required decoration including “name” and "description" properties.
* Input arguments will show up as pipeline parameters in the Pipeline system web UI. As a python rule, positional
args go first and keyword args go next.
* Each function argument is of type kfp.dsl.PipelineParam. The default values
should all be of that type. The default values will show up in the Pipeline UI but can be overwritten.


See an example [here](https://github.com/kubeflow/pipelines/blob/master/samples/xgboost-spark/xgboost-training-cm.py).
The sample pipelines give you a quick start to building and deploying machine learning pipelines with Kubeflow.
* Follow the guide to [deploy the Kubeflow pipelines service](https://github.com/kubeflow/pipelines/wiki/Deploy-the-Kubeflow-Pipelines-Service).
* Build and deploy your pipeline [using the provided samples](https://github.com/kubeflow/pipelines/wiki/Samples).

0 comments on commit 63e9c54

Please sign in to comment.