Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Basic component build sample #1681

Merged
368 changes: 368 additions & 0 deletions samples/core/component_build/component_build.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,368 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"# Copyright 2019 Google Inc. All Rights Reserved.\n",
"#\n",
"# Licensed under the Apache License, Version 2.0 (the \"License\");\n",
"# you may not use this file except in compliance with the License.\n",
"# You may obtain a copy of the License at\n",
"#\n",
"# http://www.apache.org/licenses/LICENSE-2.0\n",
"#\n",
"# Unless required by applicable law or agreed to in writing, software\n",
"# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
"# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
"# See the License for the specific language governing permissions and\n",
"# limitations under the License."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"scrolled": false
},
"outputs": [],
"source": [
"# Install Pipeline SDK - This only needs to be ran once in the enviroment. \n",
"!pip3 install kfp --upgrade"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# KubeFlow Pipelines basic component build \n",
"\n",
"In this notebook, we will demo: \n",
"\n",
"* Defining a KubeFlow pipeline with Python KFP SDK\n",
"* Creating an experiment and submitting pipelines to KFP run time enviroment using the KFP SDK \n",
"\n",
"Reference documentation: \n",
"* https://www.kubeflow.org/docs/pipelines/sdk/build-component/\n",
"* https://www.kubeflow.org/docs/pipelines/sdk/sdk-overview/"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Setup"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"tags": [
"parameters"
]
},
"outputs": [],
"source": [
"# Set your output and project. !!!Must Do before you can proceed!!!\n",
"EXPERIMENT_NAME = 'basic_component'\n",
"PROJECT_NAME = 'Your-Gcp-Project-Name' #'Your-GCP-Project-ID'\n",
"OUTPUT_DIR = 'gs://%s-basic-component' % PROJECT_NAME # A path for asset outputs\n",
"BASE_IMAGE = 'google/cloud-sdk:latest' # Base image used in various steps of the pipeline\n",
"TARGET_IMAGE = 'gcr.io/%s/component:latest' % PROJECT_NAME # Target image that will include our final code"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"!gsutil mb {OUTPUT_DIR}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Create an Experiment in the Pipeline System\n",
"\n",
"Pipeline system requires an \"Experiment\" to group pipeline runs. You can create a new experiment, or call client.list_experiments() to get existing ones."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"#Get or create an experiment and submit a pipeline run\n",
"import kfp\n",
"client = kfp.Client()\n",
"\n",
"try:\n",
" experiment = client.get_experiment(experiment_name=EXPERIMENT_NAME)\n",
"except:\n",
" experiment = client.create_experiment(EXPERIMENT_NAME)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Create a python function"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"def add(a: float, b: float) -> float:\n",
" '''Calculates sum of two arguments'''\n",
" \n",
" print(\"Adding two values %s and %s\" %(a, b))\n",
" \n",
" return a + b"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Build a Component With the Above Function\n",
"The return value \"add_op\" represents a step that can be used directly in a pipeline function. "
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"scrolled": true
},
"outputs": [
SinaChavoshi marked this conversation as resolved.
Show resolved Hide resolved
{
"name": "stdout",
"output_type": "stream",
"text": [
"2019-08-07 18:25:48:INFO:Build an image that is based on google/cloud-sdk:latest and push the image to gcr.io/chavoshi-dev-2/pusher:latest\n",
"2019-08-07 18:25:48:INFO:Checking path: gs://chavoshi-dev-2-basic-component...\n",
"2019-08-07 18:25:48:INFO:Generate entrypoint and serialization codes.\n",
"2019-08-07 18:25:48:INFO:Generate build files.\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"/opt/conda/lib/python3.6/site-packages/google/auth/_default.py:66: UserWarning: Your application has authenticated using end user credentials from Google Cloud SDK. We recommend that most server applications use service accounts instead. If your application continues to use end user credentials from Cloud SDK, you might receive a \"quota exceeded\" or \"API not enabled\" error. For more information about service accounts, see https://cloud.google.com/docs/authentication/\n",
" warnings.warn(_CLOUD_SDK_CREDENTIALS_WARNING)\n",
"/opt/conda/lib/python3.6/site-packages/google/auth/_default.py:66: UserWarning: Your application has authenticated using end user credentials from Google Cloud SDK. We recommend that most server applications use service accounts instead. If your application continues to use end user credentials from Cloud SDK, you might receive a \"quota exceeded\" or \"API not enabled\" error. For more information about service accounts, see https://cloud.google.com/docs/authentication/\n",
" warnings.warn(_CLOUD_SDK_CREDENTIALS_WARNING)\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"2019-08-07 18:25:50:INFO:Start a kaniko job for build.\n",
"2019-08-07 18:25:50:INFO:Cannot Find local kubernetes config. Trying in-cluster config.\n",
"2019-08-07 18:25:50:INFO:Initialized with in-cluster config.\n",
"2019-08-07 18:25:55:INFO:5 seconds: waiting for job to complete\n",
"2019-08-07 18:26:00:INFO:10 seconds: waiting for job to complete\n",
"2019-08-07 18:26:05:INFO:15 seconds: waiting for job to complete\n",
"2019-08-07 18:26:10:INFO:20 seconds: waiting for job to complete\n",
"2019-08-07 18:26:15:INFO:25 seconds: waiting for job to complete\n",
"2019-08-07 18:26:20:INFO:30 seconds: waiting for job to complete\n",
"2019-08-07 18:26:25:INFO:35 seconds: waiting for job to complete\n",
"2019-08-07 18:26:30:INFO:40 seconds: waiting for job to complete\n",
"2019-08-07 18:26:35:INFO:45 seconds: waiting for job to complete\n",
"2019-08-07 18:26:40:INFO:50 seconds: waiting for job to complete\n",
"2019-08-07 18:26:45:INFO:55 seconds: waiting for job to complete\n",
"2019-08-07 18:26:50:INFO:60 seconds: waiting for job to complete\n",
"2019-08-07 18:26:55:INFO:65 seconds: waiting for job to complete\n",
"2019-08-07 18:27:00:INFO:70 seconds: waiting for job to complete\n",
"2019-08-07 18:27:05:INFO:75 seconds: waiting for job to complete\n",
"2019-08-07 18:27:10:INFO:80 seconds: waiting for job to complete\n",
"2019-08-07 18:27:15:INFO:85 seconds: waiting for job to complete\n",
"2019-08-07 18:27:20:INFO:90 seconds: waiting for job to complete\n",
"2019-08-07 18:27:25:INFO:95 seconds: waiting for job to complete\n",
"2019-08-07 18:27:25:INFO:Kubernetes job failed.\n",
"\u001b[36mINFO\u001b[0m[0000] Resolved base name google/cloud-sdk:latest to google/cloud-sdk:latest \n",
"\u001b[36mINFO\u001b[0m[0000] Resolved base name google/cloud-sdk:latest to google/cloud-sdk:latest \n",
"\u001b[36mINFO\u001b[0m[0000] Downloading base image google/cloud-sdk:latest \n",
"ERROR: logging before flag.Parse: E0807 18:25:51.478485 1 metadata.go:142] while reading 'google-dockercfg' metadata: http status code: 404 while fetching url http://metadata.google.internal./computeMetadata/v1/instance/attributes/google-dockercfg\n",
"ERROR: logging before flag.Parse: E0807 18:25:51.482106 1 metadata.go:159] while reading 'google-dockercfg-url' metadata: http status code: 404 while fetching url http://metadata.google.internal./computeMetadata/v1/instance/attributes/google-dockercfg-url\n",
"2019/08/07 18:25:51 No matching credentials were found, falling back on anonymous\n",
"\u001b[36mINFO\u001b[0m[0000] Error while retrieving image from cache: getting file info: stat /cache/sha256:670bb28272fb0e70a36406c8d1137db26dc04a14cfbb6f96b5fc0e8094916757: no such file or directory \n",
"\u001b[36mINFO\u001b[0m[0000] Downloading base image google/cloud-sdk:latest \n",
"2019/08/07 18:25:51 No matching credentials were found, falling back on anonymous\n",
"\u001b[36mINFO\u001b[0m[0001] Built cross stage deps: map[] \n",
"\u001b[36mINFO\u001b[0m[0001] Downloading base image google/cloud-sdk:latest \n",
"2019/08/07 18:25:52 No matching credentials were found, falling back on anonymous\n",
"\u001b[36mINFO\u001b[0m[0001] Error while retrieving image from cache: getting file info: stat /cache/sha256:670bb28272fb0e70a36406c8d1137db26dc04a14cfbb6f96b5fc0e8094916757: no such file or directory \n",
"\u001b[36mINFO\u001b[0m[0001] Downloading base image google/cloud-sdk:latest \n",
"2019/08/07 18:25:52 No matching credentials were found, falling back on anonymous\n",
"\u001b[36mINFO\u001b[0m[0002] Checking for cached layer gcr.io/chavoshi-dev-2/pusher/cache:c32b2c40e08b79eaa6ab1b825d829a3a153ec4c2e5ce4c1c72fff9c40cc19897... \n",
"\u001b[36mINFO\u001b[0m[0002] No cached layer found for cmd RUN apt-get update -y && apt-get install --no-install-recommends -y -q python3 python3-pip python3-setuptools \n",
"\u001b[36mINFO\u001b[0m[0002] Unpacking rootfs as cmd RUN apt-get update -y && apt-get install --no-install-recommends -y -q python3 python3-pip python3-setuptools requires it. \n",
"\u001b[36mINFO\u001b[0m[0045] Taking snapshot of full filesystem... \n",
"\u001b[36mINFO\u001b[0m[0084] RUN apt-get update -y && apt-get install --no-install-recommends -y -q python3 python3-pip python3-setuptools \n",
"\u001b[36mINFO\u001b[0m[0084] cmd: /bin/sh \n",
"\u001b[36mINFO\u001b[0m[0084] args: [-c apt-get update -y && apt-get install --no-install-recommends -y -q python3 python3-pip python3-setuptools] \n",
"Ign:1 http://deb.debian.org/debian stretch InRelease\n",
"Hit:2 http://security.debian.org/debian-security stretch/updates InRelease\n",
"Hit:3 https://packages.cloud.google.com/apt cloud-sdk-stretch InRelease\n",
"Hit:4 http://deb.debian.org/debian stretch-updates InRelease\n",
"Hit:5 http://deb.debian.org/debian stretch Release\n",
"Reading package lists...\n",
"E: Release file for http://deb.debian.org/debian/dists/stretch-updates/InRelease is expired (invalid since 9h 43min 57s). Updates for this repository will not be applied.\n",
"error building image: error building stage: waiting for process to exit: exit status 100\n",
"\n",
"2019-08-07 18:27:25:INFO:Kaniko job complete.\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"/opt/conda/lib/python3.6/site-packages/google/auth/_default.py:66: UserWarning: Your application has authenticated using end user credentials from Google Cloud SDK. We recommend that most server applications use service accounts instead. If your application continues to use end user credentials from Cloud SDK, you might receive a \"quota exceeded\" or \"API not enabled\" error. For more information about service accounts, see https://cloud.google.com/docs/authentication/\n",
" warnings.warn(_CLOUD_SDK_CREDENTIALS_WARNING)\n",
"/opt/conda/lib/python3.6/site-packages/google/auth/_default.py:66: UserWarning: Your application has authenticated using end user credentials from Google Cloud SDK. We recommend that most server applications use service accounts instead. If your application continues to use end user credentials from Cloud SDK, you might receive a \"quota exceeded\" or \"API not enabled\" error. For more information about service accounts, see https://cloud.google.com/docs/authentication/\n",
" warnings.warn(_CLOUD_SDK_CREDENTIALS_WARNING)\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"2019-08-07 18:27:26:INFO:Build component complete.\n"
]
}
],
"source": [
"from kfp import compiler\n",
"\n",
"add_op = compiler.build_python_component(\n",
" component_func=add,\n",
" staging_gcs_path=OUTPUT_DIR,\n",
" dependency=[kfp.compiler.VersionedDependency(name='google-api-python-client', version='1.7.0')],\n",
" base_image=BASE_IMAGE,\n",
" target_image=TARGET_IMAGE)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Build a pipeline using this component"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [],
"source": [
"import kfp.dsl as dsl\n",
"@dsl.pipeline(\n",
" name='Calculation pipeline',\n",
" description='A sample pipeline that performs arithmetic calculations.'\n",
")\n",
"def calc_pipeline(\n",
" a='1',\n",
" b='7',\n",
" c='17',\n",
"):\n",
" #Passing pipeline parameter and a constant value as operation arguments\n",
" add_task = add_op(a, b) #Returns a dsl.ContainerOp class instance. \n",
" \n",
" #You can create explicit dependancy between the tasks using xyz_task.after(abc_task)\n",
" add_2_task = add_op(b, c)\n",
" \n",
" add_3_task = add_op(add_task.output, add_2_task.output)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Complie the pipeline"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [],
"source": [
"pipeline_func = calc_pipeline\n",
"pipeline_filename = pipeline_func.__name__ + '.pipeline.zip'\n",
"\n",
"import kfp.compiler as compiler\n",
"compiler.Compiler().compile(pipeline_func, pipeline_filename)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Submit the pipeline for execution"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"Run link <a href=\"/pipeline/#/runs/details/01df9e80-b941-11e9-b782-42010a8000a3\" target=\"_blank\" >here</a>"
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"#Specify pipeline argument values\n",
"arguments = {'a': '7', 'b': '8'}\n",
"\n",
"#Submit a pipeline run\n",
"run_name = pipeline_func.__name__ + ' run'\n",
"run_result = client.run_pipeline(experiment.id, run_name, pipeline_filename, arguments)\n",
"\n",
"#This link leads to the run information page. \n",
"#Note: There is a bug in JupyterLab that modifies the URL and makes the link stop working"
SinaChavoshi marked this conversation as resolved.
Show resolved Hide resolved
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.7"
}
},
"nbformat": 4,
"nbformat_minor": 2
}