SDK/Components - Simplified _create_task_factory_from_component_spec function #662

Ark-kun · 2019-01-10T08:14:59Z

Simple refactoring:

Most of the code was moved to _dsl_bridge.create_container_op_from_task and is less indented now.

Renamed _dsl_bridge._create_task_object to _create_container_op_from_resolved_task and moved the output name sanitization there.
Introduced the _components._created_task_transformation_handler handler that specifies the transformation function that is called when TaskSpec instance is created from ComponentSpec. Such transformation can for example convert TaskSpec to ContainerOp.

Rationale: When the pipeline author loads a component and uses it

train_test_op = load_component('TF - Train and evaluate')
...
train_test_task = train_test_op(train_data, test_data)

the train_test_task is not always container task. Sometimes it's a graph task. In this cases it should not be converted to ContainerOp. It should remain a TaskSpec. There are some other cases (e.g. creating graph components from a python pipeline function) where TaskSpec is needed.

This change is

…function

qimingj · 2019-01-10T21:26:29Z

sdk/python/kfp/components/_components.py

@@ -186,156 +186,50 @@ def _make_name_unique_by_adding_index(name:str, collection, delimiter:str):
    return unique_name


+#Holds the transformation functions that are called each time TaskSpec instance is created from a component. If there are multiple handlers, the last one is used.
+_created_task_transformation_handler = []


Can you add some comments about _created_task_transformation_handler? The only place this is populated seems to be the line 195 so it is not clear why it needs to exist as a list.

The intent is to set it in the Pipeline context. I had that code in the PR, but then I decided to split that into its own PR. The intent will be more obvious there.
Python context managers that replace some value usually make the value holder a list so that the context is re-entrant - so that the code does not break if you happen to have nested contexts. https://docs.python.org/3/library/contextlib.html#reentrant-cms

The high-level idea is as follows: Conceptually, what you get when you give arguments to a Component is a Task. But currently we need to get a ContainerOp instance. So there needs to be a TaskSpec -> ContainerOp transformation that's applied automatically when the pipeline is being compiled.

What do you think would be the best wording for a comment?

In some new cases (the existing cases will continue to work) load_component will need to return TaskSpec objects instead of ContainerOp objects. So there needs to be a way to enable/disable the TaskSpec conversion to ContainerOp. The _created_task_transformation_handler holds the transformation procedure that can be changed or disabled.

Since some of the code is for graph components, would you hold it off this when we decide graph component priority? Or, is it possible to decouple graph component from container component?

This code change is a foundation that is needed for multiple puzzle pieces that me and other team members are delivering this quarter. It's needed for the following efforts:

Debug a single component locally (container components for now, graph components - later)

Submit a run for a single component which is needed for component tests (container components for now, graph components - later)

Intermediate YAML API in the backend.

Passing the pipeline metadata (description) and input/output metadata (names, descriptions, types) to the UI. After discussing with @vicaire, @IronPan and @gaoning777 it was decided to skip the "pass information through Argo template metadata" short-term approach I suggested and either go with the intermediate YAML only or attach that intermediate yaml to the end of Argo YAML.

Implementing static type checking - All 6 cases (submission from UI, submission from Python, constant arguments in python pipeline, constant arguments in graph component, passing outputs to inputs when composing pipelines in Python, passing outputs to inputs when composing pipelines in graph component YAML)

Artifact support. Artifacts are involved when passing data between components which happens in a graph and requires the graph support.

The reason this affects that much is that many features depend on having the full information which TaskSpec has, but that's lost in transition to ContainerOp.

I see. Thanks @Ark-kun.

Just to clarify: The actual change that will make "component task factory function" output TaskSpec instead of ContainerOp in some scenarios is coming in the next PR.

This PR is just a refactoring that does not change the behavior.

BTW, this was the intent behind the whole _dsl_bridge.py file and the _task_object_factory variable - bridge between the component structures and dsl structures like ContainerOp https://github.com/kubeflow/pipelines/blob/master/sdk/python/kfp/components/_dsl_bridge.py#L40 . It turned out the initial effort was insufficient since only limited information was passed to that handler.

qimingj · 2019-01-12T02:14:41Z

this change was submitted 2 days ago and I was on it the same day or next. I want to clarify the priority of things so I have a clear view. It blocks the backend API? Is it because backend API depends on intermediate yaml? If so, that's exactly the thing I want to confirm (whether intermediate yaml is in scope).

Add @paveldournov to comment or approve.

Ark-kun · 2019-01-12T04:33:47Z

It blocks the backend API? Is it because backend API depends on intermediate yaml?

It blocks the creation of the "submit intermediate YAML to backend" API that I was asked to prioritize this week. I already planned it for Q1, but I was asked to deliver it sooner.

I want to confirm whether intermediate yaml is in scope

I understand the possible confusion. The "Intermediate YAML" P0 CUJ was in the release planning doc, but it's not in the NEXT planning doc. That might be because it was rolled into the "Package and share reusable components and pipelines" which at some point was shortened even further.
For some reason I do not see my CUJ breakdown for that item. I'll talk with Anand and Pavel about updating that information so that it's less confusing. I remember the last time we had a few of those confusing entries.

Ark-kun · 2019-01-12T04:35:40Z

@k8s-ci-robot k8s-ci-robot added the lgtm label 12 minutes ago

Why did the robot add l g t m? #662 (comment)
@jlewi, Do you know what's happening?

I'm removing it since I'm not sure where it came from.

qimingj · 2019-01-12T04:40:51Z

/lgtm

Ark-kun · 2019-01-12T04:47:24Z

Sorry, I got confused because GitHub only updated the main discussion thread, but not the comment thread, so there wasn't any LGTM comment.
The robot started auto-approving PRs today (see kubernetes/test-infra#10721) so I thought it was the case here.

Ark-kun · 2019-01-15T01:16:33Z

/approve

k8s-ci-robot · 2019-01-15T01:16:36Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Ark-kun

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~sdk/OWNERS~~ [Ark-kun]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot · 2019-01-15T01:16:37Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Ark-kun

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~sdk/OWNERS~~ [Ark-kun]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

SDK/Components - Simplified _create_task_factory_from_component_spec …

8686ccf

…function

Ark-kun added the area/sdk/components label Jan 10, 2019

Ark-kun assigned gaoning777 and qimingj Jan 10, 2019

Ark-kun requested review from gaoning777 and qimingj January 10, 2019 08:15

k8s-ci-robot requested a review from hongye-sun January 10, 2019 08:15

k8s-ci-robot added the size/L label Jan 10, 2019

qimingj reviewed Jan 10, 2019

View reviewed changes

qimingj requested a review from paveldournov January 12, 2019 02:06

k8s-ci-robot added the lgtm label Jan 12, 2019

Ark-kun removed the lgtm label Jan 12, 2019

Ark-kun mentioned this pull request Jan 12, 2019

k8s-ci-robot automatically approve PRs kubernetes/test-infra#10721

Closed

k8s-ci-robot added lgtm labels Jan 12, 2019

Ark-kun mentioned this pull request Jan 12, 2019

k8s-ci-robot automatically LGTMs PR kubernetes/test-infra#10728

Closed

k8s-ci-robot added the approved label Jan 15, 2019

k8s-ci-robot merged commit fd282d6 into kubeflow:master Jan 15, 2019

Ark-kun deleted the SDK/Components---Greatly-shortened-_create_task_factory_from_component_spec-function branch January 15, 2019 05:02

Linchin pushed a commit to Linchin/pipelines that referenced this pull request Apr 11, 2023

Fix cluster get-credentials command in playbook (kubeflow#662)

114ebdd

HumairAK pushed a commit to red-hat-data-services/data-science-pipelines that referenced this pull request Mar 11, 2024

fix(sdk): sort params to fix flaky tests. (kubeflow#662)

bd2f5f6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SDK/Components - Simplified _create_task_factory_from_component_spec function #662

SDK/Components - Simplified _create_task_factory_from_component_spec function #662

Ark-kun commented Jan 10, 2019 •

edited

Loading

qimingj Jan 10, 2019

Ark-kun Jan 10, 2019

Ark-kun Jan 10, 2019

Ark-kun Jan 11, 2019

qimingj Jan 12, 2019

Ark-kun Jan 12, 2019 •

edited

Loading

qimingj Jan 12, 2019

qimingj Jan 12, 2019

Ark-kun Jan 12, 2019

qimingj commented Jan 12, 2019

Ark-kun commented Jan 12, 2019

Ark-kun commented Jan 12, 2019 •

edited

Loading

qimingj commented Jan 12, 2019

Ark-kun commented Jan 12, 2019

Ark-kun commented Jan 15, 2019

k8s-ci-robot commented Jan 15, 2019

k8s-ci-robot commented Jan 15, 2019

SDK/Components - Simplified _create_task_factory_from_component_spec function #662

SDK/Components - Simplified _create_task_factory_from_component_spec function #662

Conversation

Ark-kun commented Jan 10, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Ark-kun Jan 12, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

qimingj commented Jan 12, 2019

Ark-kun commented Jan 12, 2019

Ark-kun commented Jan 12, 2019 • edited Loading

qimingj commented Jan 12, 2019

Ark-kun commented Jan 12, 2019

Ark-kun commented Jan 15, 2019

k8s-ci-robot commented Jan 15, 2019

k8s-ci-robot commented Jan 15, 2019

Ark-kun commented Jan 10, 2019 •

edited

Loading

Ark-kun Jan 12, 2019 •

edited

Loading

Ark-kun commented Jan 12, 2019 •

edited

Loading