Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update sample readme for the new structure #2058

Merged
merged 4 commits into from
Sep 6, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
98 changes: 67 additions & 31 deletions samples/README.md
Original file line number Diff line number Diff line change
@@ -1,27 +1,62 @@
The sample pipelines give you a quick start to building and deploying machine learning pipelines with Kubeflow.
The sample pipelines give you a quick start to build and deploy machine learning pipelines with Kubeflow Pipeline.
* Follow the guide to [deploy the Kubeflow pipelines service](https://www.kubeflow.org/docs/guides/pipelines/deploy-pipelines-service/).
* Build and deploy your pipeline [using the provided samples](https://www.kubeflow.org/docs/guides/pipelines/pipelines-samples/).




This page tells you how to use the _basic_ sample pipelines contained in the repo.
# Sample Structure
The samples are organized into the core set and the contrib set.

**Core samples** demonstrates the full KFP functionalities and are covered by the sample test infra.
The current status is not all samples are covered by the tests but they will be all covered in the near future.
A selected set of these core samples will also be preloaded to the KFP during deployment.
The core samples will also include intermediate samples that are
more complex than basic samples such as flip coins but simpler than TFX samples.
It serves to demonstrate a set of the outstanding features and offers users the next level KFP experience.

**Contrib samples** are not tested by KFP and could potentially be moved to
the core samples if the samples are of good qualty and tests are covered and it demonstrates certain KFP functionality.
Another reason to put some samples in this directory is that some samples require certain
platform support that is hard to support in our test infra.

In the Core directory, each sample will be in a separate directory.
In the Contrib directory, there is an intermediate directory for each contributor,
e.g. ibm and arena, within which each sample is in a separate directory.
An example of the resulting structure is as follows:
```
pipelines/samples/
Core/
dsl_static_type_checking/
dsl_static_type_checking.ipynb
xgboost_training_cm/
xgboost_training_cm.py
condition/
condition.py
recursion/
recursion.py
Contrib/
IBM/
ffdl-seldon/
ffdl_pipeline.ipynb
ffdl_pipeline.py
README.md
```

# Run Samples

## Compile the pipeline specification

Follow the guide to [building a pipeline](https://www.kubeflow.org/docs/guides/pipelines/build-pipeline/) to install the Kubeflow Pipelines SDK and compile the sample Python into a workflow specification. The specification takes the form of a YAML file compressed into a `.tar.gz` file.
Follow the guide to [building a pipeline](https://www.kubeflow.org/docs/guides/pipelines/build-pipeline/) to install the Kubeflow
Pipelines SDK and compile the sample Python into a workflow specification.
The specification takes one of the three forms: YAML file, YAML compressed into a `.tar.gz` file, and YAML compressed into a `.zip` file

For convenience, you can download a pre-compiled, compressed YAML file containing the
specification of the `core/sequential.py` pipeline. This saves you the steps required
to compile and compress the pipeline specification:
[sequential.tar.gz](https://storage.googleapis.com/sample-package/sequential.tar.gz)
For convenience, you can use the preloaded samples in the pipeline system. This saves you the steps required
to compile and compress the pipeline specification.

## Deploy
## Upload the pipeline to the Kubeflow Pipeline

Open the Kubeflow pipelines UI, and follow the prompts to create a new pipeline and upload the generated workflow
specification, `my-pipeline.tar.gz` (example: `sequential.tar.gz`).
specification, `my-pipeline.zip` (example: `sequential.zip`).

## Run
## Run the pipeline

Follow the pipeline UI to create pipeline runs.

Expand All @@ -30,33 +65,34 @@ Useful parameter values:
* For the "exit_handler" and "sequential" samples: `gs://ml-pipeline-playground/shakespeare1.txt`
* For the "parallel_join" sample: `gs://ml-pipeline-playground/shakespeare1.txt` and `gs://ml-pipeline-playground/shakespeare2.txt`

## Components source
## Notes: component source codes

All samples use pre-built components. The command to run for each container is built into the pipeline file.

## Sample conventions
For better readability and functions of sample test, samples are encouraged to adopt the following conventions.
# Sample contribution
For better readability and integrations with the sample test infrastructure, samples are encouraged to adopt the following conventions.

* The sample should be either `*.py` or `*.ipynb`, and its file name is in consistence with its dir name.
* The sample file should be either `*.py` or `*.ipynb`, and its file name is consistent with its directory name.
* For `*.py` sample, it's recommended to have a main invoking `kfp.compiler.Compiler().compile()` to compile the
pipeline function into pipeline yaml spec.
* For `*.ipynb` sample, parameters (e.g., experiment name and project name) should be defined in a dedicated cell and
tagged as parameter. Detailed guideline is [here](https://github.com/nteract/papermill). Also, custom environment setup
should be done within the notebook itself by `!pip install`
tagged as parameter. Detailed guideline is [here](https://github.com/nteract/papermill). Also, all the environment setup and
preparation should be within the notebook, such as by `!pip install packages`


## (Optional) Add sample test
For those samples that cover critical functions of KFP, possibly it should be picked up by KFP's sample test
so that it won't be broken by accidental PR merge. Contributors can do that by following these steps.
## How to add a sample test
Here are the ordered steps to add the sample tests for samples.
Only the core samples are expected to be added to the sample test infrastructure.

* Place the sample under the core sample directory `kubeflow/pipelines/samples/core`
* Make sure it follows the [sample conventions](#sample-conventions).
* If the test running requires specific values of pipeline parameters, they can be specified in a config yaml file
1. Make sure the sample follows the [sample conventions](#sample-conventions).
2. If the sample requires argument inputs, they can be specified in a config yaml file
placed under `test/sample-test/configs`. See
[`tfx_cab_classification.config.yaml`](https://github.com/kubeflow/pipelines/blob/master/test/sample-test/configs/tfx_cab_classification.config.yaml) as an example. The config yaml file will be validated
according to `schema.config.yaml`. If no config yaml is provided, pipeline parameters will be substituted by their
default values.
* Finally, add your test name (in consistency with the file name and dir name) in
[`tfx_cab_classification.config.yaml`](https://github.com/kubeflow/pipelines/blob/master/test/sample-test/configs/tfx_cab_classification.config.yaml)
as an example.
The config yaml file will be validated according to `schema.config.yaml`.
If no config yaml is provided, pipeline parameters will be substituted by their default values.
3. Add your test name (in consistency with the file name and dir name) in
[`test/sample_test.yaml`](https://github.com/kubeflow/pipelines/blob/ecd93a50564652553260f8008c9a2d75ab907971/test/sample_test.yaml#L69)
* (Optional) Current sample test infra only checks successful runs, without any result/outcome validation.
If those are needed, runtime checks should be included in the sample itself.
4. (*Optional*) The current sample test infra only checks if runs succeed without custom validation logic.
If needed, runtime checks should be included in the sample itself. However, there is no custom validation logic
injection support for `*.py` samples, in which case the test infra compiles the sample, submit and run the sample, and check if it succeeds.
2 changes: 1 addition & 1 deletion test/sample_test.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,7 @@ spec:
- dsl_static_type_checking
- pipeline_transformers
- secret
- sidecar
#- sidecar
# Build and push image
- name: build-image-by-dockerfile
inputs:
Expand Down