-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Tutorials/Lightweight Python components (#139)
* Tutorials/Lightweight Python components * Removed pip install kubernetes * Deleted the python component sharing notebooks.
- Loading branch information
1 parent
a540cf7
commit c9412e1
Showing
1 changed file
with
276 additions
and
0 deletions.
There are no files selected for viewing
276 changes: 276 additions & 0 deletions
276
samples/notebooks/Lightweight Python components - basics.ipynb
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,276 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"# Lightweight python components\n", | ||
"\n", | ||
"Lightweight python components do not require you to build a new container image for every code change.\n", | ||
"They're intended to use for fast iteration in notebook environment.\n", | ||
"\n", | ||
"#### Building a lightweight python component\n", | ||
"To build a component just define a stand-alone python function and then call kfp.components.func_to_container_op(func) to convert it to a component that can be used in a pipeline.\n", | ||
"\n", | ||
"There are several requirements for the function:\n", | ||
"* The function should be stand-alone. It should not use any code declared outside of the function definition. Any imports should be added inside the main function. Any helper functions should also be defined inside the main function.\n", | ||
"* The function can only import packages that are available in the base image. If you need to import a package that's not available you can try to find a container image that already includes the required packages. (As a workaround you can use the module subprocess to run pip install for the required package.)\n", | ||
"* If the function operates on numbers, the parameters need to have type hints. Supported types are ```[int, float, bool]```. Everything else is passed as string.\n", | ||
"* To build a component with multiple output values, use the typing.NamedTuple type hint syntax: ```NamedTuple('MyFunctionOutputs', [('output_name_1', type), ('output_name_2', float)])```" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"#Install the SDK\n", | ||
"!pip3 install https://storage.googleapis.com/ml-pipeline/release/0.1.1/kfp.tar.gz --upgrade\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 2, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"import kfp.components as comp" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"Simple function that just add two numbers:" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 3, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"#Define a Python function\n", | ||
"def add(a: float, b: float) -> float:\n", | ||
" '''Calculates sum of two arguments'''\n", | ||
" return a + b" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"Convert the function to a pipeline operation" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 4, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"add_op = comp.func_to_container_op(add)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"A bit more advanced function which demonstrates how to use imports, helper functions and produce multiple outputs." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 5, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"#Advanced function\n", | ||
"#Demonstrates imports, helper functions and multiple outputs\n", | ||
"from typing import NamedTuple\n", | ||
"def my_divmod(dividend: float, divisor:float) -> NamedTuple('MyDivmodOutput', [('quotient', float), ('remainder', float)]):\n", | ||
" '''Divides two numbers and calculate the quotient and remainder'''\n", | ||
" #Imports inside a component function:\n", | ||
" import numpy as np\n", | ||
"\n", | ||
" #This function demonstrates how to use nested functions inside a component function:\n", | ||
" def divmod_helper(dividend, divisor):\n", | ||
" return np.divmod(dividend, divisor)\n", | ||
"\n", | ||
" (quotient, remainder) = divmod_helper(dividend, divisor)\n", | ||
"\n", | ||
" from collections import namedtuple\n", | ||
" divmod_output = namedtuple('MyDivmodOutput', ['quotient', 'remainder'])\n", | ||
" return divmod_output(quotient, remainder)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"Test running the python function directly" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 6, | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"data": { | ||
"text/plain": [ | ||
"MyDivmodOutput(quotient=14, remainder=2)" | ||
] | ||
}, | ||
"execution_count": 6, | ||
"metadata": {}, | ||
"output_type": "execute_result" | ||
} | ||
], | ||
"source": [ | ||
"my_divmod(100, 7)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"#### Convert the function to a pipeline operation\n", | ||
"\n", | ||
"You can specify an alternative base container image (the image needs to have Python 3.5+ installed)." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 7, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"divmod_op = comp.func_to_container_op(my_divmod, base_image='tensorflow/tensorflow:1.11.0-py3')" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"#### Define the pipeline\n", | ||
"Pipeline function has to be decorated with the `@dsl.pipeline` decorator and the parameters must have default values of type `dsl.PipelineParam`" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 8, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"import kfp.dsl as dsl\n", | ||
"@dsl.pipeline(\n", | ||
" name='Calculation pipeline',\n", | ||
" description='A toy pipeline that performs arithmetic calculations.'\n", | ||
")\n", | ||
"def calc_pipeline(\n", | ||
" a=dsl.PipelineParam('a'),\n", | ||
" b=dsl.PipelineParam('b', value='7'),\n", | ||
" c=dsl.PipelineParam('c', value='17'),\n", | ||
"):\n", | ||
" #Passing pipeline parameter and a constant value as operation arguments\n", | ||
" add_task = add_op(a, 4) #Returns a dsl.ContainerOp class instance. \n", | ||
" \n", | ||
" #Passing a task output reference as operation arguments\n", | ||
" #For an operation with a single return value, the output reference can be accessed using `task.output` or `task.outputs['output_name']` syntax\n", | ||
" divmod_task = divmod_op(add_task.output, b)\n", | ||
"\n", | ||
" #For an operation with a multiple return values, the output references can be accessed using `task.outputs['output_name']` syntax\n", | ||
" result_task = add_op(divmod_task.outputs['quotient'], c)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"#### Compile the pipeline" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 9, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"pipeline_func = calc_pipeline\n", | ||
"pipeline_filename = pipeline_func.__name__ + '.pipeline.tar.gz'\n", | ||
"import kfp.compiler as compiler\n", | ||
"compiler.Compiler().compile(pipeline_func, pipeline_filename)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"#### Submit the pipeline for execution" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 10, | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"data": { | ||
"text/html": [ | ||
"Job link <a href=\"/pipeline/#/runs/details/23daf747-e2d5-11e8-bcc5-42010a800195\" target=\"_blank\" >here</a>" | ||
], | ||
"text/plain": [ | ||
"<IPython.core.display.HTML object>" | ||
] | ||
}, | ||
"metadata": {}, | ||
"output_type": "display_data" | ||
} | ||
], | ||
"source": [ | ||
"#Specify pipeline argument values\n", | ||
"arguments = {'a': '7', 'b': '8'}\n", | ||
"\n", | ||
"#Get or create an experiment and submit a pipeline run\n", | ||
"import kfp\n", | ||
"client = kfp.Client()\n", | ||
"list_experiments_response = client.list_experiments()\n", | ||
"experiments = list_experiments_response.experiments\n", | ||
"if len(experiments) == 0:\n", | ||
" #The user does not have any experiments available. Creating a new one\n", | ||
" experiment = client.create_experiment(pipeline_func.__name__ + ' experiment')\n", | ||
"else:\n", | ||
" experiment = experiments[-1] #Using the last experiment\n", | ||
"\n", | ||
"#Submit a pipeline run\n", | ||
"run_name = pipeline_func.__name__ + ' run'\n", | ||
"run_result = client.run_pipeline(experiment.id, run_name, pipeline_filename, arguments)\n", | ||
"\n", | ||
"#vvvvvvvvv This link leads to the run information page. (Note: There is a bug in JupyterLab that modifies the URL and makes the link stop working)" | ||
] | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": "Python 3", | ||
"language": "python", | ||
"name": "python3" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 3 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython3", | ||
"version": "3.6.4" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 2 | ||
} |