Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update tfjob launcher #1494

Closed
wants to merge 2 commits into from
Closed

Conversation

hougangliu
Copy link
Member

@hougangliu hougangliu commented Jun 12, 2019

the tfjob_launcher is out of date to support tfjob v1beta


This change is Reviewable

the tfjob_launcher is out of date to support tfjob v1beta
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To complete the pull request process, please assign ark-kun
You can assign the PR to them by writing /assign @ark-kun in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@hougangliu
Copy link
Member Author

/test kubeflow-pipeline-e2e-test

@hougangliu
Copy link
Member Author

/test kubeflow-pipeline-sample-test

@ryandawsonuk
Copy link
Contributor

/lgtm

I've raised a question about pvolumes for this in #1344 (comment) but I don't personally think that should prevent this being merged

@k8s-ci-robot
Copy link
Contributor

@ryandawsonuk: changing LGTM is restricted to assignees, and only kubeflow/pipelines repo collaborators may be assigned issues.

In response to this:

/lgtm

I've raised a question about pvolumes for this in #1344 (comment) but I don't personally think that should prevent this being merged

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

return dsl.ContainerOp(
name = step_name,
image = 'gcr.io/ml-pipeline/ml-pipeline-kubeflow-tf:1d55a27cf8b69696f3ab5c10687edf2fde0068c7',
image = 'liuhougangxa/tfjob-launcher',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this intentional?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, in fact kubeflow_tfjob_launcher_op.py is just used as an example about what tfjob-launcher containerOP should be
when image updated in gcr.io/ml-pipeline, the image will be updated, too

@elikatsis
Copy link
Member

elikatsis commented Jun 19, 2019

@hougangliu, Hello!

I would like to suggest some other type of mapping.
Instead of a dict {"pvc-name": "/mount/path"}, I believe it would be better to use {"/mount/path": some_k8s_volume_instance} (and isinstance(some_k8s_volume_instance, k8s_client.V1Volume) == True).

Then, the following lines

for k, v in pvc_map.iteritems():
spec['volumes'] = [{"name": k, "persistentVolumeClaim": {"claimName": k}}]
spec['containers'][0]['volumeMounts'] = [{"mountPath": v, "name": k}]

will become

spec['volumes] = []
for k, v in vol_map.iteritems()
    vmount = k8s_client.V1VolumeMount(mount_path=k, name=v.name)
    spec['volumes'].append(v)
    spec['containers'][0]['volumeMounts'].append(vmount)

In addition, it will allow the mounting of other types of volumes (e.g. Secrets).

This will also be uniform with the argument pvolumes of ContainerOp entity.
[ For more info check the volumeop samples here, and also these lines of ContainerOp definition. ]

It can then be parsed accordingly to extend the dependencies of that task, in the case where a PipelineVolume instance is passed as a value.

May I hold this PR to discuss it a bit longer? I want to ensure that we don't merge something that introduces a different API (since one already exists), which might need refactoring later on to support more functionalities. What do you think?
/hold

return dsl.ContainerOp(
name = step_name,
image = 'gcr.io/ml-pipeline/ml-pipeline-kubeflow-tf:1d55a27cf8b69696f3ab5c10687edf2fde0068c7',
image = 'liuhougangxa/tfjob-launcher',

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this code different from gcr.io/ml-pipeline/ml-pipeline-kubeflow-tf:1d55a27cf8b69696f3ab5c10687edf2fde0068c7 ? if yes, can we see source code of this docker image?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

components/kubeflow/launcher/src/launch_tf_job.py is the source code

parser.add_argument('--tfjob-version', type=str,
default='v1beta2',
help='The version of the deployed tfjob.' +
'If not set, the default namespace is v1beta2.')

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: I guess there is typo here about namespace. It should have been tfjob-version

from kubernetes import client as k8s_client
from kubernetes import config

def yamlOrJsonStr(str):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

YAML is superset of JSON, so you can just use YAML loader.

@@ -47,16 +45,18 @@ RUN wget -nv https://github.com/ksonnet/ksonnet/releases/download/v0.9.0/ks_0.9.
RUN wget https://github.com/kubeflow/tf-operator/archive/v0.3.0.zip && \
unzip v0.3.0.zip && \
mv tf-operator-0.3.0 tf-operator
RUN wget https://github.com/kubeflow/tf-operator/archive/master.zip && \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pinning the version is usually preferred to using master version.

@Ark-kun
Copy link
Contributor

Ark-kun commented Nov 19, 2019

May I hold this PR to discuss it a bit longer? I want to ensure that we don't merge something that introduces a different API (since one already exists), which might need refactoring later on to support more functionalities. What do you think?
/hold

@elikatsis Do you think we should release the hold? Usually the holds are pretty short.

@hougangliu
Copy link
Member Author

I will update this PR later today to solve the comment. Sorry for late action

@ryandawsonuk
Copy link
Contributor

ryandawsonuk commented Nov 20, 2019

Worth noting v1beta is no longer the TFJob version to aim for as it's now v1

@hougangliu
Copy link
Member Author

#2677 replace this PR

@hougangliu hougangliu closed this Nov 28, 2019
magdalenakuhn17 pushed a commit to magdalenakuhn17/pipelines that referenced this pull request Oct 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants