Update tfjob launcher #1494

hougangliu · 2019-06-12T02:51:22Z

the tfjob_launcher is out of date to support tfjob v1beta

This change is

the tfjob_launcher is out of date to support tfjob v1beta

k8s-ci-robot · 2019-06-12T02:51:26Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To complete the pull request process, please assign ark-kun
You can assign the PR to them by writing /assign @ark-kun in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

components/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

hougangliu · 2019-06-14T15:35:12Z

/test kubeflow-pipeline-e2e-test

hougangliu · 2019-06-19T00:20:50Z

/test kubeflow-pipeline-sample-test

ryandawsonuk · 2019-06-19T08:58:30Z

/lgtm

I've raised a question about pvolumes for this in #1344 (comment) but I don't personally think that should prevent this being merged

k8s-ci-robot · 2019-06-19T08:58:37Z

@ryandawsonuk: changing LGTM is restricted to assignees, and only kubeflow/pipelines repo collaborators may be assigned issues.

In response to this:

/lgtm

I've raised a question about pvolumes for this in #1344 (comment) but I don't personally think that should prevent this being merged

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

ryandawsonuk · 2019-06-19T08:59:16Z

components/kubeflow/launcher/kubeflow_tfjob_launcher_op.py

    return dsl.ContainerOp(
        name = step_name,
-        image = 'gcr.io/ml-pipeline/ml-pipeline-kubeflow-tf:1d55a27cf8b69696f3ab5c10687edf2fde0068c7',
+        image = 'liuhougangxa/tfjob-launcher',


Is this intentional?

yes, in fact kubeflow_tfjob_launcher_op.py is just used as an example about what tfjob-launcher containerOP should be
when image updated in gcr.io/ml-pipeline, the image will be updated, too

elikatsis · 2019-06-19T10:19:43Z

@hougangliu, Hello!

I would like to suggest some other type of mapping.
Instead of a dict {"pvc-name": "/mount/path"}, I believe it would be better to use {"/mount/path": some_k8s_volume_instance} (and isinstance(some_k8s_volume_instance, k8s_client.V1Volume) == True).

Then, the following lines

pipelines/components/kubeflow/launcher/src/launch_tf_job.py

Lines 70 to 72 in 4aca05e

    
           for k, v in pvc_map.iteritems(): 
        
             spec['volumes'] = [{"name": k, "persistentVolumeClaim": {"claimName": k}}] 
        
             spec['containers'][0]['volumeMounts'] = [{"mountPath": v, "name": k}]

will become

spec['volumes] = []
for k, v in vol_map.iteritems()
    vmount = k8s_client.V1VolumeMount(mount_path=k, name=v.name)
    spec['volumes'].append(v)
    spec['containers'][0]['volumeMounts'].append(vmount)

In addition, it will allow the mounting of other types of volumes (e.g. Secrets).

This will also be uniform with the argument pvolumes of ContainerOp entity.
[ For more info check the volumeop samples here, and also these lines of ContainerOp definition. ]

It can then be parsed accordingly to extend the dependencies of that task, in the case where a PipelineVolume instance is passed as a value.

May I hold this PR to discuss it a bit longer? I want to ensure that we don't merge something that introduces a different API (since one already exists), which might need refactoring later on to support more functionalities. What do you think?
/hold

SatwikBhandiwad · 2019-06-21T10:08:33Z

components/kubeflow/launcher/kubeflow_tfjob_launcher_op.py

    return dsl.ContainerOp(
        name = step_name,
-        image = 'gcr.io/ml-pipeline/ml-pipeline-kubeflow-tf:1d55a27cf8b69696f3ab5c10687edf2fde0068c7',
+        image = 'liuhougangxa/tfjob-launcher',


is this code different from gcr.io/ml-pipeline/ml-pipeline-kubeflow-tf:1d55a27cf8b69696f3ab5c10687edf2fde0068c7 ? if yes, can we see source code of this docker image?

components/kubeflow/launcher/src/launch_tf_job.py is the source code

nrchakradhar · 2019-06-28T12:38:38Z

components/kubeflow/launcher/src/launch_tf_job.py

+  parser.add_argument('--tfjob-version', type=str,
+                      default='v1beta2',
+                      help='The version of the deployed tfjob.' +
+                           'If not set, the default namespace is v1beta2.')


NIT: I guess there is typo here about namespace. It should have been tfjob-version

Ark-kun · 2019-11-18T22:46:07Z

components/kubeflow/launcher/src/launch_tf_job.py

 from kubernetes import client as k8s_client
 from kubernetes import config

+def yamlOrJsonStr(str):


YAML is superset of JSON, so you can just use YAML loader.

Ark-kun · 2019-11-18T22:46:42Z

components/kubeflow/launcher/Dockerfile

@@ -47,16 +45,18 @@ RUN wget -nv https://github.com/ksonnet/ksonnet/releases/download/v0.9.0/ks_0.9.
 RUN wget https://github.com/kubeflow/tf-operator/archive/v0.3.0.zip && \
    unzip v0.3.0.zip && \
    mv tf-operator-0.3.0 tf-operator
+RUN wget https://github.com/kubeflow/tf-operator/archive/master.zip && \


Pinning the version is usually preferred to using master version.

Ark-kun · 2019-11-19T20:26:41Z

May I hold this PR to discuss it a bit longer? I want to ensure that we don't merge something that introduces a different API (since one already exists), which might need refactoring later on to support more functionalities. What do you think?
/hold

@elikatsis Do you think we should release the hold? Usually the holds are pretty short.

hougangliu · 2019-11-19T23:09:19Z

I will update this PR later today to solve the comment. Sorry for late action

ryandawsonuk · 2019-11-20T09:31:16Z

Worth noting v1beta is no longer the TFJob version to aim for as it's now v1

hougangliu · 2019-11-28T14:06:47Z

#2677 replace this PR

Update tfjob launcher

bb286f0

the tfjob_launcher is out of date to support tfjob v1beta

k8s-ci-robot requested review from animeshsingh and Ark-kun June 12, 2019 02:51

k8s-ci-robot added the size/L label Jun 12, 2019

Merge branch 'master' into tfjob-launcher

4aca05e

ryandawsonuk mentioned this pull request Jun 19, 2019

kubeflow_tfjob_launcher_op to support volumes #1344

Closed

ryandawsonuk approved these changes Jun 19, 2019

View reviewed changes

ryandawsonuk reviewed Jun 19, 2019

View reviewed changes

k8s-ci-robot added the do-not-merge/hold label Jun 19, 2019

SatwikBhandiwad reviewed Jun 21, 2019

View reviewed changes

nrchakradhar reviewed Jun 28, 2019

View reviewed changes

Ark-kun assigned elikatsis Aug 29, 2019

Ark-kun requested review from hongye-sun and removed request for Ark-kun August 29, 2019 22:59

mak-454 mentioned this pull request Oct 2, 2019

TFJob should work well with pipelines #677

Closed

hougangliu mentioned this pull request Nov 18, 2019

Components - Upgraded TFJob version to v1 #2617

Closed

Ark-kun reviewed Nov 18, 2019

View reviewed changes

Ark-kun assigned Ark-kun, gaoning777 and hongye-sun Nov 18, 2019

Ark-kun requested a review from gaoning777 November 18, 2019 22:49

hougangliu closed this Nov 28, 2019

magdalenakuhn17 pushed a commit to magdalenakuhn17/pipelines that referenced this pull request Oct 22, 2023

add wait condition for cert-manager webhook (kubeflow#1494)

99b78ad

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update tfjob launcher #1494

Update tfjob launcher #1494

hougangliu commented Jun 12, 2019 •

edited by jlewi

Loading

k8s-ci-robot commented Jun 12, 2019

hougangliu commented Jun 14, 2019

hougangliu commented Jun 19, 2019

ryandawsonuk commented Jun 19, 2019

k8s-ci-robot commented Jun 19, 2019

ryandawsonuk Jun 19, 2019

hougangliu Jun 27, 2019

elikatsis commented Jun 19, 2019 •

edited

Loading

SatwikBhandiwad Jun 21, 2019

hougangliu Jun 27, 2019

nrchakradhar Jun 28, 2019

Ark-kun Nov 18, 2019

Ark-kun Nov 18, 2019

Ark-kun commented Nov 19, 2019

hougangliu commented Nov 19, 2019

ryandawsonuk commented Nov 20, 2019 •

edited

Loading

hougangliu commented Nov 28, 2019

Update tfjob launcher #1494

Update tfjob launcher #1494

Conversation

hougangliu commented Jun 12, 2019 • edited by jlewi Loading

k8s-ci-robot commented Jun 12, 2019

hougangliu commented Jun 14, 2019

hougangliu commented Jun 19, 2019

ryandawsonuk commented Jun 19, 2019

k8s-ci-robot commented Jun 19, 2019

ryandawsonuk Jun 19, 2019

Choose a reason for hiding this comment

hougangliu Jun 27, 2019

Choose a reason for hiding this comment

elikatsis commented Jun 19, 2019 • edited Loading

SatwikBhandiwad Jun 21, 2019

Choose a reason for hiding this comment

hougangliu Jun 27, 2019

Choose a reason for hiding this comment

nrchakradhar Jun 28, 2019

Choose a reason for hiding this comment

Ark-kun Nov 18, 2019

Choose a reason for hiding this comment

Ark-kun Nov 18, 2019

Choose a reason for hiding this comment

Ark-kun commented Nov 19, 2019

hougangliu commented Nov 19, 2019

ryandawsonuk commented Nov 20, 2019 • edited Loading

hougangliu commented Nov 28, 2019

hougangliu commented Jun 12, 2019 •

edited by jlewi

Loading

elikatsis commented Jun 19, 2019 •

edited

Loading

ryandawsonuk commented Nov 20, 2019 •

edited

Loading