Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tech writer edits #2373

Merged
merged 2 commits into from
Oct 17, 2019
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Next Next commit
tech writer edits
  • Loading branch information
jay-saldanha committed Oct 11, 2019
commit ea3f045bb54a7b7c3317f414e54b20682af57528
117 changes: 61 additions & 56 deletions components/gcp/ml_engine/train/README.md
Original file line number Diff line number Diff line change
@@ -1,33 +1,51 @@

# Name
Submitting a Cloud Machine Learning Engine training job as a pipeline step
Component: Submitting an AI Platform training job as a pipeline step
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed CLoud ML Engine to AI Platform all over the readme


# Label
GCP, Cloud ML Engine, Machine Learning, pipeline, component, Kubeflow, Kubeflow Pipeline
AI Platform, Kubeflow

# Summary
A Kubeflow Pipeline component to submit a Cloud ML Engine training job as a step in a pipeline.
A Kubeflow pipeline component to submit an AI Platform training job as a step in a pipeline.

# Facets
hongye-sun marked this conversation as resolved.
Show resolved Hide resolved
<!--Make sure the asset has data for the following facets:
Use case
Technique
Input data type
ML workflow

The data must map to the acceptable values for these facets, as documented on the “taxonomy” sheet of go/aihub-facets
https://gitlab.aihub-content-external.com/aihubbot/kfp-components/commit/fe387ab46181b5d4c7425dcb8032cb43e70411c1
--->
Use case:

Technique:

Input data type:

ML workflow:

# Details
## Intended use
Use this component to submit a training job to Cloud ML Engine from a Kubeflow Pipeline.
Use this component to submit a training job to AI Platform from a Kubeflow pipeline.

## Runtime arguments
| Argument | Description | Optional | Data type | Accepted values | Default |
|:------------------|:------------------|:----------|:--------------|:-----------------|:-------------|
| project_id | The ID of the Google Cloud Platform (GCP) project of the job. | No | GCPProjectID | | |
| python_module | The name of the Python module to run after installing the training program. | Yes | String | | None |
| package_uris | The Cloud Storage location of the packages that contain the training program and any additional dependencies. The maximum number of package URIs is 100. | Yes | List | | None |
| region | The Compute Engine region in which the training job is run. | Yes | GCPRegion | | us-central1 |
| args | The command line arguments to pass to the training program. | Yes | List | | None |
| job_dir | A Cloud Storage path in which to store the training outputs and other data needed for training. This path is passed to your TensorFlow program as the `job-dir` command-line argument. The benefit of specifying this field is that Cloud ML validates the path for use in training. | Yes | GCSPath | | None |
| python_version | The version of Python used in training. If it is not set, the default version is 2.7. Python 3.5 is available when the runtime version is set to 1.4 and above. | Yes | String | | None |
| runtime_version | The runtime version of Cloud ML Engine to use for training. If it is not set, Cloud ML Engine uses the default. | Yes | String | | 1 |
| master_image_uri | The Docker image to run on the master replica. This image must be in Container Registry. | Yes | GCRPath | | None |
| worker_image_uri | The Docker image to run on the worker replica. This image must be in Container Registry. | Yes | GCRPath | | None |
| project_id | The Google Cloud Platform (GCP) project ID of the job. | No | GCPProjectID | - | - |
| python_module | The name of the Python module to run after installing the training program. | Yes | String | - | None |
| package_uris | The Cloud Storage location of the packages that contain the training program and any additional dependencies. The maximum number of package URIs is 100. | Yes | List | -| None |
| region | The Compute Engine region in which the training job is run. | Yes | GCPRegion | -| us-central1 |
| args | The command line arguments to pass to the training program. | Yes | List | - | None |
| job_dir | A Cloud Storage path in which to store the training outputs and other data needed for training. This path is passed to your TensorFlow program as the command-line argument, `job-dir`. The benefit of specifying this field is that Cloud ML validates the path for use in training. | Yes | GCSPath | - | None |
| python_version | The version of Python used in training. If it is not set, the default version is 2.7. Python 3.5 is available when the runtime version is set to 1.4 and above. | Yes | String | - | None |
| runtime_version | The runtime version of AI Platform to use for training. If it is not set, AI Platform uses the default. | Yes | String | - | 1 |
| master_image_uri | The Docker image to run on the master replica. This image must be in Container Registry. | Yes | GCRPath | - | None |
| worker_image_uri | The Docker image to run on the worker replica. This image must be in Container Registry. | Yes | GCRPath |- | None |
| training_input | The input parameters to create a training job. | Yes | Dict | [TrainingInput](https://cloud.google.com/ml-engine/reference/rest/v1/projects.jobs#TrainingInput) | None |
| job_id_prefix | The prefix of the job ID that is generated. | Yes | String | | None |
| wait_interval | The number of seconds to wait between API calls to get the status of the job. | Yes | Integer | | 30 |
| job_id_prefix | The prefix of the job ID that is generated. | Yes | String | - | None |
| wait_interval | The number of seconds to wait between API calls to get the status of the job. | Yes | Integer | - | 30 |



Expand All @@ -43,7 +61,7 @@ The component accepts two types of inputs:
| Name | Description | Type |
|:------- |:---- | :--- |
| job_id | The ID of the created job. | String |
| job_dir | The Cloud Storage path that contains the trained model output files. | GCSPath |
| job_dir | The Cloud Storage path that contains the output files with the trained model. | GCSPath |


## Cautions & requirements
Expand All @@ -63,66 +81,58 @@ To use the component, you must:

## Detailed description

The component builds the [TrainingInput](https://cloud.google.com/ml-engine/reference/rest/v1/projects.jobs#TrainingInput) payload and submits a job via the [Cloud ML Engine REST API](https://cloud.google.com/ml-engine/reference/rest/v1/projects.jobs).
The component builds the [TrainingInput](https://cloud.google.com/ml-engine/reference/rest/v1/projects.jobs#TrainingInput) payload and submits a job via the [AI Platform REST API](https://cloud.google.com/ml-engine/reference/rest/v1/projects.jobs).

The steps to use the component in a pipeline are:


1. Install the Kubeflow Pipeline SDK:

1. Install the Kubeflow pipeline's SDK:

```python
%%capture --no-stderr

```python
%%capture --no-stderr

KFP_PACKAGE = 'https://storage.googleapis.com/ml-pipeline/release/0.1.14/kfp.tar.gz'
!pip3 install $KFP_PACKAGE --upgrade
```

2. Load the component using KFP SDK

KFP_PACKAGE = 'https://storage.googleapis.com/ml-pipeline/release/0.1.14/kfp.tar.gz'
!pip3 install $KFP_PACKAGE --upgrade
```

```python
import kfp.components as comp
2. Load the component using the Kubeflow pipeline's SDK:

mlengine_train_op = comp.load_component_from_url(
'https://raw.githubusercontent.com/kubeflow/pipelines/e598176c02f45371336ccaa819409e8ec83743df/components/gcp/ml_engine/train/component.yaml')
help(mlengine_train_op)
```
```python
import kfp.components as comp

mlengine_train_op = comp.load_component_from_url('https://raw.githubusercontent.com/kubeflow/pipelines/e598176c02f45371336ccaa819409e8ec83743df/components/gcp/ml_engine/train/component.yaml')
help(mlengine_train_op)
```
### Sample
Note: The following sample code works in an IPython notebook or directly in Python code.
The following sample code works in an IPython notebook or directly in Python code.

In this sample, you use the code from the [census estimator sample](https://github.com/GoogleCloudPlatform/cloudml-samples/tree/master/census/estimator) to train a model in Cloud ML Engine. To upload the code to Cloud ML Engine, package the Python code and upload it to a Cloud Storage bucket.
In this sample, you use the code from the [census estimator sample](https://github.com/GoogleCloudPlatform/cloudml-samples/tree/master/census/estimator) to train a model on AI Platform. To upload the code to AI Platform, package the Python code and upload it to a Cloud Storage bucket.

Note: You must have read and write permissions on the bucket that you use as the working directory.
#### Set sample parameters

#### Set sample parameters

```python
# Required Parameters
PROJECT_ID = '<Please put your project ID here>'
GCS_WORKING_DIR = 'gs://<Please put your GCS path here>' # No ending slash
# Required parameters
PROJECT_ID = '<Put your project ID here>'
GCS_WORKING_DIR = 'gs://<Put your GCS path here>' # No ending slash
```


```python
# Optional Parameters
# Optional parameters
EXPERIMENT_NAME = 'CLOUDML - Train'
TRAINER_GCS_PATH = GCS_WORKING_DIR + '/train/trainer.tar.gz'
OUTPUT_GCS_PATH = GCS_WORKING_DIR + '/train/output/'
```

#### Clean up the working directory


```python
%%capture --no-stderr
!gsutil rm -r $GCS_WORKING_DIR
```

#### Download the sample trainer code to local

#### Download the sample trainer code to a local directory

```python
%%capture --no-stderr
Expand All @@ -132,7 +142,6 @@ OUTPUT_GCS_PATH = GCS_WORKING_DIR + '/train/output/'

#### Package code and upload the package to Cloud Storage


```python
%%capture --no-stderr
%%bash -s "$TRAINER_GCS_PATH"
Expand All @@ -145,7 +154,6 @@ rm -fr ./cloudml-samples-master/ ./master.zip ./dist

#### Example pipeline that uses the component


```python
import kfp.dsl as dsl
import kfp.gcp as gcp
Expand Down Expand Up @@ -192,7 +200,6 @@ def pipeline(

#### Compile the pipeline


```python
pipeline_func = pipeline
pipeline_filename = pipeline_func.__name__ + '.zip'
Expand All @@ -202,12 +209,11 @@ compiler.Compiler().compile(pipeline_func, pipeline_filename)

#### Submit the pipeline for execution


```python
#Specify pipeline argument values
#Specify values for the pipeline's arguments
arguments = {}

#Get or create an experiment and submit a pipeline run
#Get or create an experiment
import kfp
client = kfp.Client()
experiment = client.create_experiment(EXPERIMENT_NAME)
Expand All @@ -221,16 +227,15 @@ run_result = client.run_pipeline(experiment.id, run_name, pipeline_filename, arg

Use the following command to inspect the contents in the output directory:


```python
!gsutil ls $OUTPUT_GCS_PATH
```

## References
* [Component python code](https://github.com/kubeflow/pipelines/blob/master/components/gcp/container/component_sdk/python/kfp_component/google/ml_engine/_train.py)
* [Component docker file](https://github.com/kubeflow/pipelines/blob/master/components/gcp/container/Dockerfile)
* [Component Python code](https://github.com/kubeflow/pipelines/blob/master/components/gcp/container/component_sdk/python/kfp_component/google/ml_engine/_train.py)
* [Component Docker file](https://github.com/kubeflow/pipelines/blob/master/components/gcp/container/Dockerfile)
* [Sample notebook](https://github.com/kubeflow/pipelines/blob/master/components/gcp/ml_engine/train/sample.ipynb)
* [Cloud Machine Learning Engine job REST API](https://cloud.google.com/ml-engine/reference/rest/v1/projects.jobs)
* [AI Platform REST API - Resource: Job](https://cloud.google.com/ml-engine/reference/rest/v1/projects.jobs)

## License
By deploying or using this software you agree to comply with the [AI Hub Terms of Service](https://aihub.cloud.google.com/u/0/aihub-tos) and the [Google APIs Terms of Service](https://developers.google.com/terms/). To the extent of a direct conflict of terms, the AI Hub Terms of Service will control.