Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(sdk): Allow non-pythonic names for graph components' task's outputs. Fixes #4514. #4515

Merged

Conversation

Udiknedormin
Copy link
Contributor

Pythonic-to-original output name mapping got a default value equal to the output's name. Solves #4514, this is the suggested solution described there. Also adds tests to check if the problem was resolved.

It seems it could be cherry-picked in the current release branch.

@kubeflow-bot
Copy link

This change is Reviewable

@k8s-ci-robot
Copy link
Contributor

Hi @Udiknedormin. Thanks for your PR.

I'm waiting for a kubeflow member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@Tomcli
Copy link
Member

Tomcli commented Sep 18, 2020

/ok-to-test

@Tomcli
Copy link
Member

Tomcli commented Sep 18, 2020

/test kubeflow-pipelines-tfx-python36

@Tomcli
Copy link
Member

Tomcli commented Sep 18, 2020

It looks like the TFX test failed due to an older version of protobuf.
https://stackoverflow.com/questions/61922334/how-to-solve-attributeerror-module-google-protobuf-descriptor-has-no-attribu

@numerology
Copy link

Sent #4516 to fix.

Sorry for the inconvenience.

@Tomcli
Copy link
Member

Tomcli commented Sep 18, 2020

Thanks @numerology
@Udiknedormin can you rebase the PR once @numerology merged the fix? Thanks.

Comment on lines 540 to 545
task_outputs_with_original_names = {
# component_bridge generates outputs under both pythonic and original name,
# so half of them are absent from pythonic_output_name_to_original
pythonic_output_name_to_original.get(pythonic_output_name, pythonic_output_name): output_value
for pythonic_output_name, output_value in task_outputs_with_pythonic_names.items()
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can be simpler:

Suggested change
task_outputs_with_original_names = {
# component_bridge generates outputs under both pythonic and original name,
# so half of them are absent from pythonic_output_name_to_original
pythonic_output_name_to_original.get(pythonic_output_name, pythonic_output_name): output_value
for pythonic_output_name, output_value in task_outputs_with_pythonic_names.items()
}
task_outputs_with_original_names = {output.name: task_obj.outputs[output.name] for output in task_component_spec.outputs or []}

output_name_to_pythonic, pythonic_output_name_to_original and task_outputs_with_pythonic_names are not needed then.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Ark-kun
Was it ever a part of the contract for _container_task_constructor? It's not backwards-compatible, but it certainly would simplify things.

I wasn't sure about it, considering that _container_task_constructor can be overwritten and I only know of two overwrites:

  • _components._create_task_spec_from_component_and_arguments --- which uses original outputs' names, as seen in TaskSpec._init_outputs
  • _component_bridghe._create_container_op_from_component_and_arguments --- which uses both, as shown in the issue

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would the following be ok?

KT = TypeVar('KT') # could be imported from typing, of course
VT = TypeVar('VT')
def get_maybe_remap(m: Mapping[KT, VT], key: KT,
                    key_mapper: Callable[[KT], KT]) -> VT:
    if key in m:
        return m[key]
    else:
        return m[key_mapper(key)]
(...)

task_outputs_with_original_names = {
    output.name: get_maybe_pythonic(task_obj.outputs, output.name,
                                    output_name_to_pythonic.__getitem__)
    for output in task_component_spec.outputs or []
}

It would remove the need for pythonic_output_name_to_original (i.e. reverse-mapping of output_name_to_pythonic) and task_outputs_with_pythonic_names, leaving just output_name_to_pythonic "just in case", for backward-compatibility. It's also reusable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was it ever a part of the contract for _container_task_constructor? It's not backwards-compatible, but it certainly would simplify things.

I think this should be the contract (original output names should be the keys in the task.outputs dictionary).
I think at his point it's still OK it make this change. The contract is private and I do not think anyone else has implemented the contact yet (also we're making a significantly more drastic change to it very soon).
We can always restore the behavior is we encounter unfixable case in the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Ark-kun Ok, if it can be changed, then I'll simplify it and add the explicit contract.

@Ark-kun
Copy link
Contributor

Ark-kun commented Sep 20, 2020

/retest

Loading container component from component.yaml creates both
pythonic and original output names. Graph component iterated over
all outputs, using pythonic-to-output conversion on all. If some
of the names are not identical to their pythonic versions, they
rised KeyError on the lookup table.

This commit fixes this problem by using default value for the lookup.
@Udiknedormin Udiknedormin force-pushed the fix/graph_component_pythonic_outputs branch from b343cdd to 368d29b Compare September 21, 2020 12:18
@Udiknedormin
Copy link
Contributor Author

@Tomcli

can you rebase the PR once @numerology merged the fix?

Done.

@Ark-kun
Copy link
Contributor

Ark-kun commented Sep 21, 2020

/retest

@Udiknedormin Udiknedormin force-pushed the fix/graph_component_pythonic_outputs branch 3 times, most recently from dccbb46 to bef604c Compare September 22, 2020 17:36
@Udiknedormin
Copy link
Contributor Author

@Ark-kun I added the contract and Protocol for both the ResolvedTask and ContainerTaskConstructor. I think it should be clear enough now. As a bonus, it should provide a little better user experience for type-checker users.

@Ark-kun
Copy link
Contributor

Ark-kun commented Sep 25, 2020

I added the contract and Protocol for both the ResolvedTask and ContainerTaskConstructor. I think it should be clear enough now.

I really appreciate you Protocol proposal, but is it possible to stage this work into two PRs? To first just fix the bug and add the tests and then do the refactoring in another PR.
Protocols are interesting, but we need to get them right. For example the Task object interface is only expected to have outputs map. There are no more expectation at this moment.
There also is an ongoing refactoring in this place which will change the ContainerTaskConstructor protocol.

@Udiknedormin Udiknedormin force-pushed the fix/graph_component_pythonic_outputs branch from bef604c to 5d80cb5 Compare September 25, 2020 13:16
@Udiknedormin
Copy link
Contributor Author

@Ark-kun Fine by me. Reverted to the minimal version.

@Ark-kun
Copy link
Contributor

Ark-kun commented Sep 29, 2020

/lgtm
/approve

JFYI, the refactorings that I've mentioned are #3447, #3448 . I'll keep the idea of protocols in mind. Your previous code is available here: bef604c

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Ark-kun

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@Ark-kun Ark-kun added the cherrypick-approved area OWNER approves to cherry pick this PR to current active release branch label Sep 29, 2020
@k8s-ci-robot k8s-ci-robot merged commit 0b31879 into kubeflow:master Sep 29, 2020
@Bobgy Bobgy added the cherrypicked cherry picked to release branch `release-x.y` label Oct 10, 2020
Bobgy pushed a commit that referenced this pull request Oct 10, 2020
…ts. Fixes #4514. (#4515)

* add tests for pythonic and non-pythonic component outputs

* fix: graph for non-pythonic container output's names

Loading container component from component.yaml creates both
pythonic and original output names. Graph component iterated over
all outputs, using pythonic-to-output conversion on all. If some
of the names are not identical to their pythonic versions, they
rised KeyError on the lookup table.

This commit fixes this problem by using default value for the lookup.

* remove depythonification of outputs - not needed anymore
Jeffwan pushed a commit to Jeffwan/pipelines that referenced this pull request Dec 9, 2020
…ts. Fixes kubeflow#4514. (kubeflow#4515)

* add tests for pythonic and non-pythonic component outputs

* fix: graph for non-pythonic container output's names

Loading container component from component.yaml creates both
pythonic and original output names. Graph component iterated over
all outputs, using pythonic-to-output conversion on all. If some
of the names are not identical to their pythonic versions, they
rised KeyError on the lookup table.

This commit fixes this problem by using default value for the lookup.

* remove depythonification of outputs - not needed anymore
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved cherrypick-approved area OWNER approves to cherry pick this PR to current active release branch cherrypicked cherry picked to release branch `release-x.y` cla: yes lgtm ok-to-test size/M
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants