-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update tfjob launcher #1494
Update tfjob launcher #1494
Conversation
the tfjob_launcher is out of date to support tfjob v1beta
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/test kubeflow-pipeline-e2e-test |
/test kubeflow-pipeline-sample-test |
/lgtm I've raised a question about pvolumes for this in #1344 (comment) but I don't personally think that should prevent this being merged |
@ryandawsonuk: changing LGTM is restricted to assignees, and only kubeflow/pipelines repo collaborators may be assigned issues. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
return dsl.ContainerOp( | ||
name = step_name, | ||
image = 'gcr.io/ml-pipeline/ml-pipeline-kubeflow-tf:1d55a27cf8b69696f3ab5c10687edf2fde0068c7', | ||
image = 'liuhougangxa/tfjob-launcher', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this intentional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, in fact kubeflow_tfjob_launcher_op.py is just used as an example about what tfjob-launcher containerOP should be
when image updated in gcr.io/ml-pipeline, the image will be updated, too
@hougangliu, Hello! I would like to suggest some other type of mapping. Then, the following lines pipelines/components/kubeflow/launcher/src/launch_tf_job.py Lines 70 to 72 in 4aca05e
will become spec['volumes] = []
for k, v in vol_map.iteritems()
vmount = k8s_client.V1VolumeMount(mount_path=k, name=v.name)
spec['volumes'].append(v)
spec['containers'][0]['volumeMounts'].append(vmount) In addition, it will allow the mounting of other types of volumes (e.g. This will also be uniform with the argument It can then be parsed accordingly to extend the dependencies of that task, in the case where a May I hold this PR to discuss it a bit longer? I want to ensure that we don't merge something that introduces a different API (since one already exists), which might need refactoring later on to support more functionalities. What do you think? |
return dsl.ContainerOp( | ||
name = step_name, | ||
image = 'gcr.io/ml-pipeline/ml-pipeline-kubeflow-tf:1d55a27cf8b69696f3ab5c10687edf2fde0068c7', | ||
image = 'liuhougangxa/tfjob-launcher', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this code different from gcr.io/ml-pipeline/ml-pipeline-kubeflow-tf:1d55a27cf8b69696f3ab5c10687edf2fde0068c7 ? if yes, can we see source code of this docker image?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
components/kubeflow/launcher/src/launch_tf_job.py is the source code
parser.add_argument('--tfjob-version', type=str, | ||
default='v1beta2', | ||
help='The version of the deployed tfjob.' + | ||
'If not set, the default namespace is v1beta2.') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: I guess there is typo here about namespace. It should have been tfjob-version
from kubernetes import client as k8s_client | ||
from kubernetes import config | ||
|
||
def yamlOrJsonStr(str): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
YAML is superset of JSON, so you can just use YAML loader.
@@ -47,16 +45,18 @@ RUN wget -nv https://github.com/ksonnet/ksonnet/releases/download/v0.9.0/ks_0.9. | |||
RUN wget https://github.com/kubeflow/tf-operator/archive/v0.3.0.zip && \ | |||
unzip v0.3.0.zip && \ | |||
mv tf-operator-0.3.0 tf-operator | |||
RUN wget https://github.com/kubeflow/tf-operator/archive/master.zip && \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pinning the version is usually preferred to using master version.
@elikatsis Do you think we should release the hold? Usually the holds are pretty short. |
I will update this PR later today to solve the comment. Sorry for late action |
Worth noting v1beta is no longer the TFJob version to aim for as it's now v1 |
#2677 replace this PR |
the tfjob_launcher is out of date to support tfjob v1beta
This change is