The sample pipelines give you a quick start to build and deploy machine learning pipelines with Kubeflow Pipeline.
- Follow the guide to deploy the Kubeflow pipelines service.
- Build and deploy your pipeline using the provided samples.
The samples are organized into the core set and the contrib set.
Core samples demonstrates the full KFP functionalities and are covered by the sample test infra. The current status is not all samples are covered by the tests but they will be all covered in the near future. A selected set of these core samples will also be preloaded to the KFP during deployment. The core samples will also include intermediate samples that are more complex than basic samples such as flip coins but simpler than TFX samples. It serves to demonstrate a set of the outstanding features and offers users the next level KFP experience.
Contrib samples are not tested by KFP and could potentially be moved to the core samples if the samples are of good quality and tests are covered and it demonstrates certain KFP functionality. Another reason to put some samples in this directory is that some samples require certain platform support that is hard to support in our test infra.
In the Core directory, each sample will be in a separate directory. In the Contrib directory, there is an intermediate directory for each contributor, e.g. ibm and arena, within which each sample is in a separate directory. An example of the resulting structure is as follows:
pipelines/samples/
Core/
dsl_static_type_checking/
dsl_static_type_checking.ipynb
xgboost_training_cm/
xgboost_training_cm.py
condition/
condition.py
recursion/
recursion.py
Contrib/
IBM/
ffdl-seldon/
ffdl_pipeline.ipynb
ffdl_pipeline.py
README.md
Follow the guide to building a pipeline to install the Kubeflow
Pipelines SDK and compile the sample Python into a workflow specification.
The specification takes one of the three forms: YAML file, YAML compressed into a .tar.gz
file, and YAML compressed into a .zip
file
For convenience, you can use the preloaded samples in the pipeline system. This saves you the steps required to compile and compress the pipeline specification.
Open the Kubeflow pipelines UI, and follow the prompts to create a new pipeline and upload the generated workflow
specification, my-pipeline.zip
(example: sequential.zip
).
Follow the pipeline UI to create pipeline runs.
Useful parameter values:
- For the "exit_handler" and "sequential" samples:
gs://ml-pipeline-playground/shakespeare1.txt
- For the "parallel_join" sample:
gs://ml-pipeline-playground/shakespeare1.txt
andgs://ml-pipeline-playground/shakespeare2.txt
All samples use pre-built components. The command to run for each container is built into the pipeline file.
For better readability and integrations with the sample test infrastructure, samples are encouraged to adopt the following conventions.
- The sample file should be either
*.py
or*.ipynb
, and its file name is consistent with its directory name. - For
*.py
sample, it's recommended to have a main invokingkfp.compiler.Compiler().compile()
to compile the pipeline function into pipeline yaml spec. - For
*.ipynb
sample, parameters (e.g.,project_name
) should be defined in a dedicated cell and tagged as parameter. (If the author would like the sample test infra to run it by setting therun_pipeline
flag to True in the associatedconfig.yaml
file, the sample test infra will expect the sample to use thekfp.Client().create_run_from_pipeline_func
method for starting the run so that the sample test can watch the run.) Detailed guideline is here. Also, all the environment setup and preparation should be within the notebook, such as by!pip install packages
Here are the ordered steps to add the sample tests for samples. Only the core samples are expected to be added to the sample test infrastructure.
- Make sure the sample follows the sample conventions.
- If the sample requires argument inputs, they can be specified in a config yaml file
placed under
test/sample-test/configs
. Seexgboost_training_cm.config.yaml
as an example. The config yaml file will be validated according toschema.config.yaml
. If no config yaml is provided, pipeline parameters will be substituted by their default values. - Add your test name (in consistency with the file name and dir name) in
test/sample_test.yaml
- (Optional) The current sample test infra only checks if runs succeed without custom validation logic.
If needed, runtime checks should be included in the sample itself. However, there is no custom validation logic
injection support for
*.py
samples, in which case the test infra compiles the sample, submit and run the sample, and check if it succeeds.